[antlr-interest] Any plans of next ANTLR Release

Graham Wideman gwlist at grahamwideman.com
Fri Apr 30 16:58:35 PDT 2010


OOooooo!  That looks quite.... exciting!

Now I'm wondering if there could be a little more propagation of token names from the grammar to labels in the bytecode source?  Ie: could A, B and I appear in there, as labels on the  lines, and annotations or something on the destinations in split?

-- Graham


At 4/30/2010 04:41 PM, Terence Parr wrote:

>On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
>> This prompts me to wonder how debuggable these lexers will be?  Currently a certain amount of troubleshooting of lexing/parsing can be done by inspecting the generated lexer source, single-stepping it and so on.
>> 
>> If you move to encoding the lexer logic in bytecodes, does the generated lexer source become an inscrutable black box?  Or is there still meaningful source code to examine, trace etc?
>
>Yup. The bytecode is actually easier to read than the java ;)
>
>lexer grammar L2;
>A : 'ab';
>B : 'a'..'z'+ ;
>I : '0'..'9'+ ;
>
>yields:
>
>0000:   split         9, 16, 29   // says 3 paths are possible
>0009:   match8        'a'
>0011:   match8        'b'
>0013:   accept        4
>0016:   range8        'a', 'z'
>0019:   split         16, 26
>0026:   accept        5
>0029:   range8        '0', '9'
>0032:   split         29, 39 // go back or fall out of loop into accept state
>0039:   accept        6
>
>is that what you mean?  It's 1-to-1 with the grammar. taken almost verbatim from Russ Cox's description of VM-based NFA simulation.
>
>ANTLR v4 uses 42 bytes to encode entire L2 grammar.   ANTLR v3 generates 246 lines of Java and 2709 bytes of java .class file:
>
>/tmp $ wc -l L2.java
>     246 L2.java
>/tmp $ ls -l L2.class
>-rw-r--r--  1 parrt  wheel  2709 Apr 30 16:39 L2.class
>
>Ter



More information about the antlr-interest mailing list