[antlr-interest] my brief thoughts on performance
Terence Parr
parrt at cs.usfca.edu
Thu Feb 24 16:44:24 PST 2005
Howdy,
Thanks very much to everyone for their thoughts regarding the
performance of the ANTLR-generated lexers.
First, ANTLR 2.x lexers unfortunately never had speed as a primary
concern. Still I apologize for how slow they have indeed become!
Yikes! I was taking a risk going to recursive-descent lexers; I like
them but my original implementation was slow.
Next, I agree that much can be done to increase speed by tailoring the
lexer methods like LA and so on to be suitable for your application.
Even in general I think much could be done. I looked at it last year
when Sriram S. told me that he got a huge improvement, but I didn't see
an obvious way to be general and do what he did. The conclusion I have
for 2.x is that you must rely on non-general user modifications to make
it go fast; i.e., you have to do the tweaks, but perhaps we can
summarize some useful tweaks. Perhaps a quick change to the code
generators will allow caching of LA(i).
For the future, speed is a concern. To that end, I have taken a number
of steps for v3.0:
o Token objects don't create strings nor will trees copy stuff out of
tokens.
o LA(i) is much simpler as everything is assumed to be buffered up
(unless you use a different char stream object)
o I try to use LA(i) exactly once per decision
o the DFA-based decision structure is doing one computation to pick an
alt instead of n computations for n alts.
o i'm generating bytecodes directly (for the java target) so that I can
access the goto instruction. This is required for the cyclic DFAs that
use arbitrary lookahead.
o as people have pointed out, the code generation structure of 3.0 is
really flexible. By saying target=JohnsTarget, it will load
JohnsTarget StringTemplate group instead of the default Java target or
whatever. This can be a tweaked Java or C or whatever template file.
o i have not done this yet, but I anticipate many fewer redundant
method calls in the lexer for simple tokens like semicolon.
Just to be clear, speed is a concern, but I also care a great deal
about having ANTLR do most things "out of the box" including things
like asking for the start/stop token matched by a rule etc... This
extra weight can be trimmed by people like Ruslan easily but I want it
to do the common things easily.
Best regards,
Terence
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list