[antlr-interest] To Ter and others: Observations on performance problems and how Antlr could be made faster

John D. Mitchell johnm-antlr at non.net
Mon Mar 28 13:57:26 PST 2005


>>>>> "M" == M C <mortench2004 at yahoo.dk> writes:
[...]

> 1) Some possible uses of Antlr are not compilers but interactive
> applications. High performance applications may need to parse distinct
> texts a LOT(!)  of times (for instance the query parser for a search
> engine). For such applications concurrency and reuse of objects can be
> very important. Currently, ANTLR parsers are not multi-thread safe and
> object reuse is hard to do (not explicitly supported). Generation of
> reusable MT-safe parsers/lexers would be a great (optional) feature.

The notion has been that you can instantiate as many instances of your
lexer/parser/walker that you want to get the concurrency that you desire.


> 2) Some classes like antlr.Parser and antlr.CharScanner use inefficient
> legacy java classes like tokenTypeToASTClassMap’s HashTable) instead
> of HashMap (unnecessary synchronization for a parser/scanner that is not
> thread-safe anyway).

That will be easier to change in Antlr v3 both because of the rewrite as
well as the use of StringTemplates for the code generation.


> 3) The current API is targeted for large files but does not allow for
> efficient parsing of small strings like “x AND NOT (y AND z OR w)”
> compared to a hand-written parser. Antlr should generate a lexer with a
> constructor that accepts a string (or even better a CharSequence) so that
> the overhead of a StringReader can be avoided for such cases.

> 3) Antlr should try to limit the number of Strings that it generates or
> forces the antlr-user to generate because of API limitations. In that
> relation, for JDK 1.4+, CharSequence can sometimes used as a very fast
> String replacement. (BTW: I did not find any uses of StringBuffer but for
> JDK 1.5+ it should be replaced with StringBuilder).

Antlr v3 does a lot to improve efficiency in not generating excess
garbage, not copying data unnecessarily, and precision in touching e.g. LA
to make decisions.

Hope this helps,
		John


More information about the antlr-interest mailing list