[antlr-interest] Any plans of next ANTLR Release
Graham Wideman
gwlist at grahamwideman.com
Fri Apr 30 16:58:35 PDT 2010
OOooooo! That looks quite.... exciting!
Now I'm wondering if there could be a little more propagation of token names from the grammar to labels in the bytecode source? Ie: could A, B and I appear in there, as labels on the lines, and annotations or something on the destinations in split?
-- Graham
At 4/30/2010 04:41 PM, Terence Parr wrote:
>On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
>> This prompts me to wonder how debuggable these lexers will be? Currently a certain amount of troubleshooting of lexing/parsing can be done by inspecting the generated lexer source, single-stepping it and so on.
>>
>> If you move to encoding the lexer logic in bytecodes, does the generated lexer source become an inscrutable black box? Or is there still meaningful source code to examine, trace etc?
>
>Yup. The bytecode is actually easier to read than the java ;)
>
>lexer grammar L2;
>A : 'ab';
>B : 'a'..'z'+ ;
>I : '0'..'9'+ ;
>
>yields:
>
>0000: split 9, 16, 29 // says 3 paths are possible
>0009: match8 'a'
>0011: match8 'b'
>0013: accept 4
>0016: range8 'a', 'z'
>0019: split 16, 26
>0026: accept 5
>0029: range8 '0', '9'
>0032: split 29, 39 // go back or fall out of loop into accept state
>0039: accept 6
>
>is that what you mean? It's 1-to-1 with the grammar. taken almost verbatim from Russ Cox's description of VM-based NFA simulation.
>
>ANTLR v4 uses 42 bytes to encode entire L2 grammar. ANTLR v3 generates 246 lines of Java and 2709 bytes of java .class file:
>
>/tmp $ wc -l L2.java
> 246 L2.java
>/tmp $ ls -l L2.class
>-rw-r--r-- 1 parrt wheel 2709 Apr 30 16:39 L2.class
>
>Ter
More information about the antlr-interest
mailing list