[antlr-interest] Re: hooray! got ANTLR 3 to lex java

lgcraymer lgc at mail1.jpl.nasa.gov
Fri Dec 3 15:17:20 PST 2004



--- In antlr-interest at yahoogroups.com, Terence Parr <parrt at c...> wrote:
> Howdy,
> 
> A nice little milestone on the way to getting lexing/parsing working 
> properly for 3.0.  I just got the Java lexer from java.g to walk 
> through a 120k 5000 line java program.  That is to say, ANTLR 3 is able

Yay!!!
 
> to read the java.g lexical spec and generate a program to break up a 
> 5000 java program into tokens.  ANTLR generates both 2150 lines of Java 
> (for the main lexing functionality) and generates 33k of bytecodes 
> directly to implement some cyclic DFAs.
> 
> Note that even though I'm being hideously inefficient by backtracking 
> on every token and by using linear searches for DFA edges rather than 
> switch-statements, it lexes the 5000 line Java program in .9s on my 
> 1Ghz mac laptop versus 1.7s for the 2.7.4 version of ANTLR on the same 
> input.  So, once I spend some time to optimize things, it should go 
> pretty dog gone fast! :)

Well, it should achieve respectable speeds for java.  Chris Leung,
IIRC, reported a 5x speedup over ANTLR for llk, so you still have some
big wins ahead.

The 33K of bytecode may be a concern for unusual languages with lots
of keywords--the classfile format only allows 64K of bytecode to be
covered by exception handlers, although it uses a 32-bit int for code
size--go figure.  Or maybe not:  a useful experiment would be to, say,
add a 100 randomly chosen "keywords" in batches of 10 and see how the
bytecode size scales.  The fact that the DFA for Java lexing fits
within 33K, though, is very good news--C++ will be bigger, but not
tremendously so, and other conventional languages should fit.

All in all, this is very good news!

--Loring

> 
> Anyway, if it can do the java lexer, it can do the parser as that is 
> much simpler in terms of lookahead DFAs (should be acyclic and LL(2)) 
> for the most part.  Oh, and the grammar doesn't need any of the 
> manually specified syntactic predicates. :)  The DFA thingie just 
> builds a little state machine to look ahead.
> 
> Whew! ...was driving me nuts tracking down lookahead errors in 
> thousands of lines of generate bytecode ;)
> 
> Ter
> --
> CS Professor & Grad Director, University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list