[antlr-interest] To Ter and others: Observations on performance
problems and how Antlr could be made faster
M C
mortench2004 at yahoo.dk
Sun Mar 27 13:47:03 PST 2005
I have been looking briefly at the source-code for
Antlr with an eye towards general implementation
performance issues (looking at java coding issues only
- not algorithms). I have made some observations that
I hope the Antlr implementers will think about.
1) Some possible uses of Antlr are not compilers but
interactive applications. High performance
applications may need to parse distinct texts a LOT(!)
of times (for instance the query parser for a search
engine). For such applications concurrency and reuse
of objects can be very important. Currently, ANTLR
parsers are not multi-thread safe and object reuse is
hard to do (not explicitly supported). Generation of
reusable MT-safe parsers/lexers would be a great
(optional) feature.
2) Some classes like antlr.Parser and
antlr.CharScanner use inefficient legacy java classes
like tokenTypeToASTClassMaps HashTable) instead of
HashMap (unnecessary synchronization for a
parser/scanner that is not thread-safe anyway).
3) The current API is targeted for large files but
does not allow for efficient parsing of small strings
like x AND NOT (y AND z OR w) compared to a
hand-written parser. Antlr should generate a lexer
with a constructor that accepts a string (or even
better a CharSequence) so that the overhead of a
StringReader can be avoided for such cases.
3) Antlr should try to limit the number of Strings
that it generates or forces the antlr-user to generate
because of API limitations. In that relation, for JDK
1.4+, CharSequence can sometimes used as a very fast
String replacement. (BTW: I did not find any uses of
StringBuffer but for JDK 1.5+ it should be replaced
with StringBuilder).
Sincerely,
Morten Christensen
More information about the antlr-interest
mailing list