[antlr-interest] Performance problem with $text in ANTLR 3.1

Thu Aug 23 18:09:56 PDT 2012

I'm not sure backtracking is the issue here. This grammar implements a C
like preprocessor by rewriting the token input stream. The rewrite grammar
option is true and we use a TokeRewriteStream as input. Using $text
results in a call stack like this:

  BufferedTokeStream.toString(Token, Token)
  TokenRewriteStream.toString(int, int)
  TokenRewriteStream.toString(String, int, int)

  TokenRewriteStream.reduceToSingleOperationPerIndex(List)
  TokenRewriteStream.getKindOfOps(List, Class, int)

I haven't analyzed the code in detail, but reduceToSingleOperationPerIndex
seems to a lot of work, although most of the time is spent in
getKindOfOps. I found two places where $text was referenced many times in
each input line. I found ways of getting the desired behavior without
referencing $text. We're still not as fast as we were with 3.0.1, but
performance is now usable and getKindOfOps is no longer a major hot spot.

Thanks all for your suggestions.

-rich

On 8/23/12 3:06 PM, "Terence Parr" <parrt at cs.usfca.edu> wrote:

>hi. maybe your grammar backtracks a lot and so there's lots of template
>construction.
>Ter
>On Aug 23, 2012, at 10:58 AM, Richard Knox <rich at dreambox.com> wrote:
>
>> I've upgraded to ANTLR 3.4, and I'm still seeing the same problem. One
>> test with VisualVM showed us spending 126 sec. (78% of total time) in
>> TokenRewriteStream.getKindOfOps. The big time consumers came from usages
>> of $text. Some of this grammar is old and krufty. We may be using $text
>> gratuitously, but I didn't see these problems with ANTLR 3.0.1.
>> 
>> I may be able to mitigate this problem by avoiding use of $text. For
>> example the following code accounted for 58 seconds of the time we spent
>> in getKindOfOps:
>> 
>> regular_tokens
>> 	:	WORD -> template(val={ cpp.Lookup($text) }) "<val>"
>> 	|	INT_LITERAL
>> 	|	STRING_LITERAL
>> 	|	PROXY_START_LITERAL | PROXY_MIDDLE_LITERAL | PROXY_END_LITERAL
>> 	|	punctuation
>> 	;
>> 
>> I made the following change to use $WORD.text instead of text:
>> 
>> regular_tokens
>> 	:	WORD -> template(val={ cpp.Lookup($WORD.text) }) "<val>"
>> 	|	INT_LITERAL
>> 	|	STRING_LITERAL
>> 	|	PROXY_START_LITERAL | PROXY_MIDDLE_LITERAL | PROXY_END_LITERAL
>> 	|	punctuation
>> 	;
>> 
>> This dropped total time in getKindOfOpes from 126 sec to 37 sec.
>> 
>> Two questions:
>> 1) Is $text known to be badly performing? Should we avoid its use
>>wherever
>> possible?
>> 2) What changed from ANTLR 3.0.1 to cause this dramatic slow down?
>> 
>> Thanks.
>> 
>> -rich
>> 
>> 
>> 
>> On 8/21/12 6:55 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
>> 
>>> Move to 3.4. String template is much faster. However I am not sure
>>>about
>>> your view of the performance traits; but go to 3.4 and then you will be
>>> in a space to start analysis properly. At 3.1 no one can really help.
>>> 
>>> Jim
>>> 
>>> On Aug 21, 2012, at 6:26 PM, Richard Knox <rich at dreambox.com> wrote:
>>> 
>>>> I recently upgraded an ANTLR based application from ANTLR 3.0.1 to
>>>> ANTLR 3.1. Since doing this, our application has been running MUCH
>>>> slower. I did some profiling with VisualVM and found that we were
>>>> spending most of our time in TokenRewriteStream.getKindOfOps. We get
>>>> there when we reference $text from a grammar with options
>>>> output=template and rewrite=true. Are there known performance issues
>>>> with this scenario in ANTRL 3.1? Would I get better perf with a later
>>>> version? Any suggested work arounds? Thanks!
>>>> 
>>>> -rich
>>>> 
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe: 
>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> 
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>