[antlr-interest] DFA's encoded directly in java bytecodes

Terence Parr parrt at cs.usfca.edu
Tue Oct 19 12:29:28 PDT 2004


Howdy,

For the Java target of ANTLR 3.0, I've been concerned about the hideous 
nature of the code generated for DFA machines.  Without a goto 
instruction in Java, you have to use a massive switch (slow as death) 
or use method calls (slow); plus you have to create like an object for 
each of thousands of states when the Lexer or Parser is loaded.  Ick.

Sriram Srinivasan at the ANTLR2004 workshop suggested generating byte 
code directly.  There is a goto bytecode of course. :)  The beauty of 
these suckers is that the CPU program counter will be used (once 
compiled to native code) to traverse the states of the DFA.  That will 
be insanely fast; about the same speed as the C DFAs Ric generates for 
3.0.

So, I will generate byte code assembly for the DFAs from the ANTLR code 
generator using StringTemplate so people can tweak the byte codes 
easily.  For example, you might want to inline some of the char fetch 
stuff.  That will haul ass!

The only problem was that I didn't want to use BCEL or Jasmin or any 
other package since it means ANTLR depends on those packages (i hate 
that).  So, I spent the weekend and yesterday building a .class file 
writer and then a byte code assembler.  Both are tiny as they are very 
specific to my needs (only handles like 50 bytecodes).  Nonetheless, 
900 total lines of code is nice. :)  An excellent price to pay to avoid 
other people's libraries.

Here is a sample (partial) state in my assembly code

     aload 0
     iconst 1
     invokeinterface IntegerStream.LA 2
     istore 1
     iload 1
     iconst 98
     if_icmplt x
     iload 1
     ireturn
x:
     iconst -1
     ireturn

I'm guessing that a minimal state will be about 20-ish bytes, allowing 
for a max of about 3000 states in any specific DFA.  Not a serious 
limitation probably.

I also hope to use this code in my CS652 grad prog lang class whenever 
I teach that again.  A nice real world problem.

Ter
PS	yes, the .class file format is pretty complicated, but not 
unreasonable given the problem
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list