[antlr-interest] Any plans of next ANTLR Release
Terence Parr
parrt at cs.usfca.edu
Fri Apr 30 16:41:50 PDT 2010
On Apr 30, 2010, at 4:27 PM, Graham Wideman wrote:
> This prompts me to wonder how debuggable these lexers will be? Currently a certain amount of troubleshooting of lexing/parsing can be done by inspecting the generated lexer source, single-stepping it and so on.
>
> If you move to encoding the lexer logic in bytecodes, does the generated lexer source become an inscrutable black box? Or is there still meaningful source code to examine, trace etc?
Yup. The bytecode is actually easier to read than the java ;)
lexer grammar L2;
A : 'ab';
B : 'a'..'z'+ ;
I : '0'..'9'+ ;
yields:
0000: split 9, 16, 29 // says 3 paths are possible
0009: match8 'a'
0011: match8 'b'
0013: accept 4
0016: range8 'a', 'z'
0019: split 16, 26
0026: accept 5
0029: range8 '0', '9'
0032: split 29, 39 // go back or fall out of loop into accept state
0039: accept 6
is that what you mean? It's 1-to-1 with the grammar. taken almost verbatim from Russ Cox's description of VM-based NFA simulation.
ANTLR v4 uses 42 bytes to encode entire L2 grammar. ANTLR v3 generates 246 lines of Java and 2709 bytes of java .class file:
/tmp $ wc -l L2.java
246 L2.java
/tmp $ ls -l L2.class
-rw-r--r-- 1 parrt wheel 2709 Apr 30 16:39 L2.class
Ter
More information about the antlr-interest
mailing list