[antlr-interest] To Ter and others: Observations on performanceproblems and how Antlr could be made faster

Sun Mar 27 21:43:26 PST 2005

Hello Morten!

1. I more and more thinking about introduction taboo on ANTLR2-perfomance 
comments until ANTLR3 is not coming. Wanna perfomance - look at my 
high-perfomance C# parser (www.csharpparser.com), 50 Mb/s AST generation is 
real!

2. It is too complicated to optimize parser for fast processing small files. 
For example, my parser on small set of small files (totally less than 5 Kb) 
works 6-7 slower than on big set of files (about 10 Mb). For your string 
example, I believe parser will be about 100-1000 times slower.

3. Multi-threading is the next step for improving perfomance. But fast 
multi-treading algorithms is always are platform and language-dependent. And 
it is too difficult implement fast common multi-threading algorithms and 
data structures for all basic ANTLR-supported languages (Java, C++, C#). But 
for one concrete language (C++) and platform (Windows) develop is not so 
difficult. I think for multi-threading support it is enough implement only 
multi-threading allocator and hash-table. Also I think that parser needs 
only partially thread-safe algorithms, full thread-safe algorithms need only 
for source transformations. I'm interesting in cooperation to develop it.

4. Also see nice Herb Sutter's article 
http://www.gotw.ca/publications/concurrency-ddj.htm.

--
Alex Sedow
alex at csharpparser.com

>I have been looking briefly at the source-code for
> Antlr with an eye towards general implementation
> performance issues (looking at java coding issues only
> - not algorithms). I have made some observations that
> I hope the Antlr implementers will think about.
>
> 1) Some possible uses of Antlr are not compilers but
> interactive applications. High performance
> applications may need to parse distinct texts a LOT(!)
> of times (for instance the query parser for a search
> engine). For such applications concurrency and reuse
> of objects can be very important. Currently, ANTLR
> parsers are not multi-thread safe and object reuse is
> hard to do (not explicitly supported). Generation of
> reusable MT-safe parsers/lexers would be a great
> (optional) feature.
>
> 2) Some classes like antlr.Parser and
> antlr.CharScanner use inefficient legacy java classes
> like tokenTypeToASTClassMap's HashTable) instead of
> HashMap (unnecessary synchronization for a
> parser/scanner that is not thread-safe anyway).
>
> 3) The current API is targeted for large files but
> does not allow for efficient parsing of small strings
> like "x AND NOT (y AND z OR w)" compared to a
> hand-written parser. Antlr should generate a lexer
> with a constructor that accepts a string (or even
> better a CharSequence) so that the overhead of a
> StringReader can be avoided for such cases.
>
> 3) Antlr should try to limit the number of Strings
> that it generates or forces the antlr-user to generate
> because of API limitations. In that relation, for JDK
> 1.4+, CharSequence can sometimes used as a very fast
> String replacement. (BTW: I did not find any uses of
> StringBuffer but for JDK 1.5+ it should be replaced
> with StringBuilder).
>
> Sincerely,
> Morten Christensen
>