[antlr-interest] DFA's encoded directly in java bytecodes
Jim O'Connor
Jim.OConnor at microfocus.com
Tue Oct 19 12:56:39 PDT 2004
http://www.kimbly.com/code/
Public domain class file reading and writing ;)
Might be more fluff than you need but...
> -----Original Message-----
> From: Terence Parr [mailto:parrt at cs.usfca.edu]
> Sent: Tuesday, October 19, 2004 2:29 PM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] DFA's encoded directly in java bytecodes
>
>
> Howdy,
>
> For the Java target of ANTLR 3.0, I've been concerned about the hideous
> nature of the code generated for DFA machines. Without a goto
> instruction in Java, you have to use a massive switch (slow as death)
> or use method calls (slow); plus you have to create like an object for
> each of thousands of states when the Lexer or Parser is loaded. Ick.
>
> Sriram Srinivasan at the ANTLR2004 workshop suggested generating byte
> code directly. There is a goto bytecode of course. :) The beauty of
> these suckers is that the CPU program counter will be used (once
> compiled to native code) to traverse the states of the DFA. That will
> be insanely fast; about the same speed as the C DFAs Ric generates for
> 3.0.
>
> So, I will generate byte code assembly for the DFAs from the ANTLR code
> generator using StringTemplate so people can tweak the byte codes
> easily. For example, you might want to inline some of the char fetch
> stuff. That will haul ass!
>
> The only problem was that I didn't want to use BCEL or Jasmin or any
> other package since it means ANTLR depends on those packages (i hate
> that). So, I spent the weekend and yesterday building a .class file
> writer and then a byte code assembler. Both are tiny as they are very
> specific to my needs (only handles like 50 bytecodes). Nonetheless,
> 900 total lines of code is nice. :) An excellent price to pay to avoid
> other people's libraries.
>
> Here is a sample (partial) state in my assembly code
>
> aload 0
> iconst 1
> invokeinterface IntegerStream.LA 2
> istore 1
> iload 1
> iconst 98
> if_icmplt x
> iload 1
> ireturn
> x:
> iconst -1
> ireturn
>
> I'm guessing that a minimal state will be about 20-ish bytes, allowing
> for a max of about 3000 states in any specific DFA. Not a serious
> limitation probably.
>
> I also hope to use this code in my CS652 grad prog lang class whenever
> I teach that again. A nice real world problem.
>
> Ter
> PS yes, the .class file format is pretty complicated, but not
> unreasonable given the problem
> --
> CS Professor & Grad Director, University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>
>
> ________________________________________________________________________
> This e-mail has been scanned for viruses by MessageLabs. For further
> information visit http://www.messagelabs.com
> ________________________________________________________________________
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list