[antlr-interest] Performance problem with $text in ANTLR 3.1
Richard Knox
rich at dreambox.com
Thu Aug 23 18:09:56 PDT 2012
I'm not sure backtracking is the issue here. This grammar implements a C
like preprocessor by rewriting the token input stream. The rewrite grammar
option is true and we use a TokeRewriteStream as input. Using $text
results in a call stack like this:
BufferedTokeStream.toString(Token, Token)
TokenRewriteStream.toString(int, int)
TokenRewriteStream.toString(String, int, int)
TokenRewriteStream.reduceToSingleOperationPerIndex(List)
TokenRewriteStream.getKindOfOps(List, Class, int)
I haven't analyzed the code in detail, but reduceToSingleOperationPerIndex
seems to a lot of work, although most of the time is spent in
getKindOfOps. I found two places where $text was referenced many times in
each input line. I found ways of getting the desired behavior without
referencing $text. We're still not as fast as we were with 3.0.1, but
performance is now usable and getKindOfOps is no longer a major hot spot.
Thanks all for your suggestions.
-rich
On 8/23/12 3:06 PM, "Terence Parr" <parrt at cs.usfca.edu> wrote:
>hi. maybe your grammar backtracks a lot and so there's lots of template
>construction.
>Ter
>On Aug 23, 2012, at 10:58 AM, Richard Knox <rich at dreambox.com> wrote:
>
>> I've upgraded to ANTLR 3.4, and I'm still seeing the same problem. One
>> test with VisualVM showed us spending 126 sec. (78% of total time) in
>> TokenRewriteStream.getKindOfOps. The big time consumers came from usages
>> of $text. Some of this grammar is old and krufty. We may be using $text
>> gratuitously, but I didn't see these problems with ANTLR 3.0.1.
>>
>> I may be able to mitigate this problem by avoiding use of $text. For
>> example the following code accounted for 58 seconds of the time we spent
>> in getKindOfOps:
>>
>> regular_tokens
>> : WORD -> template(val={ cpp.Lookup($text) }) "<val>"
>> | INT_LITERAL
>> | STRING_LITERAL
>> | PROXY_START_LITERAL | PROXY_MIDDLE_LITERAL | PROXY_END_LITERAL
>> | punctuation
>> ;
>>
>> I made the following change to use $WORD.text instead of text:
>>
>> regular_tokens
>> : WORD -> template(val={ cpp.Lookup($WORD.text) }) "<val>"
>> | INT_LITERAL
>> | STRING_LITERAL
>> | PROXY_START_LITERAL | PROXY_MIDDLE_LITERAL | PROXY_END_LITERAL
>> | punctuation
>> ;
>>
>> This dropped total time in getKindOfOpes from 126 sec to 37 sec.
>>
>> Two questions:
>> 1) Is $text known to be badly performing? Should we avoid its use
>>wherever
>> possible?
>> 2) What changed from ANTLR 3.0.1 to cause this dramatic slow down?
>>
>> Thanks.
>>
>> -rich
>>
>>
>>
>> On 8/21/12 6:55 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
>>
>>> Move to 3.4. String template is much faster. However I am not sure
>>>about
>>> your view of the performance traits; but go to 3.4 and then you will be
>>> in a space to start analysis properly. At 3.1 no one can really help.
>>>
>>> Jim
>>>
>>> On Aug 21, 2012, at 6:26 PM, Richard Knox <rich at dreambox.com> wrote:
>>>
>>>> I recently upgraded an ANTLR based application from ANTLR 3.0.1 to
>>>> ANTLR 3.1. Since doing this, our application has been running MUCH
>>>> slower. I did some profiling with VisualVM and found that we were
>>>> spending most of our time in TokenRewriteStream.getKindOfOps. We get
>>>> there when we reference $text from a grammar with options
>>>> output=template and rewrite=true. Are there known performance issues
>>>> with this scenario in ANTRL 3.1? Would I get better perf with a later
>>>> version? Any suggested work arounds? Thanks!
>>>>
>>>> -rich
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe:
>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list