[antlr-interest] ANTLR performance

Tue May 11 05:22:35 PDT 2010

On Tue, May 11, 2010 at 1:34 PM, Chrobot, Stefan
<Stefan.Chrobot at sabre.com> wrote:
>>>>> I'm using ANTLR with the C# target. The generated parser performs
> too
>>>>> slow for my needs. My grammar uses k = 6.
>>>>>
>>>>> Does it have a performance impact? What value should I target to
> get
>>>>> optimum performance - 1 or *? Would changing the grammar to 1/*
> give
>>>>> significant performance boost?
>>>>>
>>>> Could you try it yourself?
>>>> I mean test it. I would be interested in your results too..
>>>>
>>>
>>> It would probably take a good amount of time to change the grammar
> and
>>> the actions. I can't invest my time in that. Even more, since I found
>>> that the real performance bottleneck is in my case the use of rewrite
>>> rules, TokenRewriteStream and StringTemplate. I got about 100x
>>> performance boost after disabling the rewriting (leaving my actions
> in
>>> place). I guess I'll have to do the outputting myself. This will be a
>>> costly task (both implementation and performance-wise), but I suspect
>>> (and truly hope) to get something like 50x performance improvement
> from
>>> the original solution.
>>>
>>What kind of speed is slow for you? How big are the files that you
> analyse?
>
> For my needs, 10 seconds is definitely too much for a 25KB input. I'm
> shooting for up to 0.5sec.
>
>

Hi
I have never cared about parsing performance that much, so there is a
chance that my comment here will be useless to you, measure yourself:
So, having said that, I have generally observed that automatic tree
construction is kind of slow (though it's been ok for my use cases) -
basically, if that's feasible, I rather try to implement my own trees
using visitor pattern/and or specific tree structures that are aligned
to what I need. Also, default CharStream/TokenStream implementations
may not be what you want. See for example how it implements
mark/release. I gained once a lot of speedup implementing my own line
counting and got rid of its state keeping in mark/release, I used
simple table of line endings positions with binary search. There are
lot of things to tailor. Also I try not to use mechanism which buffers
file input at once - but that might not prove to be big gain to you
(if you assume that most of your input is correct then it will
probably not be, if you assume otherwise than it may be). Let us know
how it goes.

-- 
Greetings
Marcin Rzeźnicki