[antlr-interest] ANTLR performance

Fri Jan 16 06:08:02 PST 2009

Hello,

thank you very much for your help. Here are the answers.

> 1) Did you you tune the token and tree node allocation pool to reflect
> something like the number of tokens you might expect (this really just
> small, large) the default is usually good for most things. This grammar is
> creating 909856 tree nodes according to your stats and this does not seem
> quite right but possibly is - if it is then you need to change the
> allocation pools to be much bigger, in which case the algorithm will make
> far less. Have you by chance turned on output=AST but not shaped the tree in
> anyway, resulting in a flat tree that must be rewritten all the time? The
> other possibility is the tree building mechanism - currently the way that
> the code is generated is not optimal, even though the runtime mechanisms are
> - we do a lot of stream rewriting which isn;t strictly speaking necessary
> and the gcc compiler won't be doing this.

The grammar generates a pretty standard AST tree (definitely not a
flat one). Turning off the tree-building mechanism made all the
difference - it saved about 80% of the running time. Being on par with
gcc here (although we still need to remember that gcc does a full
compilation) is quite a nice result for a generated parser.

<<<
~/tmp/test$ time ./parser
Reading a.c

real    0m0.481s
user    0m0.360s
sys     0m0.108s
>>>

The problem is I need the AST :(

> 2) You don't say whether you are doing anything in the grammar, such as
> accessing text and so on - you have to take out any action code to see what
> the actual baseline is for the parser. For instance accessing token text is
> a convenience function and you would not use it in a compiler that requires
> speed;

There is no action code except for handling typedefs. This is not a problem.

> 3) As per the examples, when you know that the input stream is 8 bit or 16
> bit, you can set a macro in your grammar such that it traverses the input
> stream directly using pointers and not via function calls. Did you turn this
> on.

I tried

@lexer::header{
#define ANTLR3_INLINE_INPUT_ASCII
}

This actually makes the parser a tiny bit slower than without the #define.

> 4) I would need to see your grammar rather than a gprof (BTW - use
> kcachegrind for better views of this). Realistically, the gcc compiler has
> been worked on for years and is hand crafted, whereas if your grammar has
> predicates or other inefficient methodology, it isn't much of a comparison.
> It may not be so much that ANTLR generates slower code as that the grammar
> could be better, but there is no way to know without your grammar to look at
> (which I would like to see before looking further).

Thanks for suggesting kcachegrind. It is much better than gprof :) I
do not think valgrind existed the last time I did any serious
programming in plain C ... My only complaint is the cycle detection.
Having <cycle1> accounting for most of the time is not what I really
want. However if I skip cycle detection, kcachegring grabs 100% of one
CPU core and keeps crunching ...

The profiling info in kcachegring after turning off the tree-building
seems to be about what I expected. To give you the idea:
20.6% (inclusive time) mTokens[called 255 744 times] - this is the lexer part
12% (inclusive time) antlr3dfapredict  - that's not too bad. I can
definitely decrease this significantly.
12% (inclusive time) isTypeName [called 21 800 times] - this function
checks whether the given identifier is a type name or not. It calls ~3
800 000 times strcmp (4.13%) That can also be improved.

> 5) Are you taking out the time spent reading the file? Are you reading the
> file into memory the same way that gcc does it?

No, I'm not. This is what I do:

input = antlr3AsciiFileStreamNew(fName);
       lxr = GNUCaLexerNew(input);
       tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
TOKENSOURCE(lxr));
       psr = GNUCaParserNew(tstream);
       psr->translationUnit(psr);

I can

> 6) I presume that this is the latest 3.1.1 C runtime?

yep

> At the end of the day though it is difficult to compete against hand crafted
> C code that has had a lot of effort put into optimizing it :-).

Sure, I'm not expecting miracles :) That time without the
tree-building seems reasonable to me.

> Publish your grammars so I can take a look at this.

Sure, the grammar will be published soon under GPL. It just needs some clean-up.

To sum up - the main culprit is the tree-building mechanism. Other
than that the performance of the generated parser is quite reasonable.
The problem is that tree-building/rewriting capabilities of ANTLR are
the main reason why we use ANTLR. Second - I need (or would prefer) a
fast Java target. Java does not have to be much slower than C. If Java
is that much slower, we will probably end up using C target for the
parser and then translating the AST into a Java structure ... I will
turn off the tree-building in Java and see what happens.

Once again, thanks to everybody for their suggestions!

Regards,
Jan

--
Dr. Jan Obdržálek <obdrzalek at fi.muni.cz>
Faculty of Informatics, Masaryk University, Brno, Czech Republic