[antlr-interest] DFA's encoded directly in java bytecodes

Jim O'Connor Jim.OConnor at microfocus.com
Tue Oct 19 12:56:39 PDT 2004


http://www.kimbly.com/code/

Public domain class file reading and writing ;)
Might be more fluff than you need but...

> -----Original Message-----
> From: Terence Parr [mailto:parrt at cs.usfca.edu]
> Sent: Tuesday, October 19, 2004 2:29 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] DFA's encoded directly in java bytecodes
> 
> 
> Howdy,
> 
> For the Java target of ANTLR 3.0, I've been concerned about the hideous
> nature of the code generated for DFA machines.  Without a goto
> instruction in Java, you have to use a massive switch (slow as death)
> or use method calls (slow); plus you have to create like an object for
> each of thousands of states when the Lexer or Parser is loaded.  Ick.
> 
> Sriram Srinivasan at the ANTLR2004 workshop suggested generating byte
> code directly.  There is a goto bytecode of course. :)  The beauty of
> these suckers is that the CPU program counter will be used (once
> compiled to native code) to traverse the states of the DFA.  That will
> be insanely fast; about the same speed as the C DFAs Ric generates for
> 3.0.
> 
> So, I will generate byte code assembly for the DFAs from the ANTLR code
> generator using StringTemplate so people can tweak the byte codes
> easily.  For example, you might want to inline some of the char fetch
> stuff.  That will haul ass!
> 
> The only problem was that I didn't want to use BCEL or Jasmin or any
> other package since it means ANTLR depends on those packages (i hate
> that).  So, I spent the weekend and yesterday building a .class file
> writer and then a byte code assembler.  Both are tiny as they are very
> specific to my needs (only handles like 50 bytecodes).  Nonetheless,
> 900 total lines of code is nice. :)  An excellent price to pay to avoid
> other people's libraries.
> 
> Here is a sample (partial) state in my assembly code
> 
>      aload 0
>      iconst 1
>      invokeinterface IntegerStream.LA 2
>      istore 1
>      iload 1
>      iconst 98
>      if_icmplt x
>      iload 1
>      ireturn
> x:
>      iconst -1
>      ireturn
> 
> I'm guessing that a minimal state will be about 20-ish bytes, allowing
> for a max of about 3000 states in any specific DFA.  Not a serious
> limitation probably.
> 
> I also hope to use this code in my CS652 grad prog lang class whenever
> I teach that again.  A nice real world problem.
> 
> Ter
> PS	yes, the .class file format is pretty complicated, but not
> unreasonable given the problem
> --
> CS Professor & Grad Director, University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!
> 
> 
> 
> 
> 
> 
> Yahoo! Groups Links
> 
> 
> 
> 
> 
> 
> 
> 
> ________________________________________________________________________
> This e-mail has been scanned for viruses by MessageLabs. For further
> information visit http://www.messagelabs.com
> ________________________________________________________________________


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list