[antlr-interest] Any plans of next ANTLR Release

Fri Apr 30 16:41:50 PDT 2010

On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
> This prompts me to wonder how debuggable these lexers will be?  Currently a certain amount of troubleshooting of lexing/parsing can be done by inspecting the generated lexer source, single-stepping it and so on.
> 
> If you move to encoding the lexer logic in bytecodes, does the generated lexer source become an inscrutable black box?  Or is there still meaningful source code to examine, trace etc?

Yup. The bytecode is actually easier to read than the java ;)

lexer grammar L2;
A : 'ab';
B : 'a'..'z'+ ;
I : '0'..'9'+ ;

yields:

0000:	split         9, 16, 29   // says 3 paths are possible
0009:	match8        'a'
0011:	match8        'b'
0013:	accept        4
0016:	range8        'a', 'z'
0019:	split         16, 26
0026:	accept        5
0029:	range8        '0', '9'
0032:	split         29, 39 // go back or fall out of loop into accept state
0039:	accept        6

is that what you mean?  It's 1-to-1 with the grammar. taken almost verbatim from Russ Cox's description of VM-based NFA simulation.

ANTLR v4 uses 42 bytes to encode entire L2 grammar.   ANTLR v3 generates 246 lines of Java and 2709 bytes of java .class file:

/tmp $ wc -l L2.java
     246 L2.java
/tmp $ ls -l L2.class
-rw-r--r--  1 parrt  wheel  2709 Apr 30 16:39 L2.class

Ter