[antlr-interest] To Ter and others: Observations on performance problems and how Antlr could be made faster

Sun Mar 27 13:47:03 PST 2005

I have been looking briefly at the source-code for
Antlr with an eye towards general implementation
performance issues (looking at java coding issues only
- not algorithms). I have made some observations that
I hope the Antlr implementers will think about.

1) Some possible uses of Antlr are not compilers but
interactive applications. High performance
applications may need to parse distinct texts a LOT(!)
of times (for instance the query parser for a search
engine). For such applications concurrency and reuse
of objects can be very important. Currently, ANTLR
parsers are not multi-thread safe and object reuse is
hard to do (not explicitly supported). Generation of
reusable MT-safe parsers/lexers would be a great
(optional) feature.

2) Some classes like antlr.Parser and
antlr.CharScanner use inefficient legacy java classes
like tokenTypeToASTClassMap’s HashTable) instead of
HashMap (unnecessary synchronization for a
parser/scanner that is not thread-safe anyway). 

3) The current API is targeted for large files but
does not allow for efficient parsing of small strings
like “x AND NOT (y AND z OR w)” compared to a
hand-written parser. Antlr should generate a lexer
with a constructor that accepts a string (or even
better a CharSequence) so that the overhead of a
StringReader can be avoided for such cases.

3) Antlr should try to limit the number of Strings
that it generates or forces the antlr-user to generate
because of API limitations. In that relation, for JDK
1.4+, CharSequence can sometimes used as a very fast
String replacement. (BTW: I did not find any uses of
StringBuffer but for JDK 1.5+ it should be replaced
with StringBuilder).

Sincerely,
Morten Christensen