[antlr-interest] ANTLR performance

Chrobot, Stefan Stefan.Chrobot at sabre.com
Tue May 11 07:46:30 PDT 2010


Thanks for your response, Lorenzo!

This is exactly what's happening with my code.
I dropped the rewriting and created my own mechanism. The running time
dropped from ~10.00sec to ~00.10sec. Below I present my solution.


Stefan


1) I created a custom token class:
internal class CustomToken : CommonToken
{
    private string myText;

    public CustomToken(ICharStream input, int type, int channel, int
start, int stop)
        : base(input, type, channel, start, stop)
    {
    }

    public void ParseAs(string text)
    {
        myText = text;
    }

    public override string Text
    {
        get
        {
            return myText ?? base.Text;
        }
        set
        {
            base.Text = value;
        }
    }
}

2) Made lexer emit CustomTokens:
public override IToken Emit()
{
    var token = new CustomToken(this.input, base.state.type,
base.state.channel, base.state.tokenStartCharIndex, this.CharIndex - 1);
    token.Line = base.state.tokenStartLine;
    token.Text = base.state.text;
    token.CharPositionInLine = base.state.tokenStartCharPositionInLine;
    this.Emit(token);
    return token;
}	

3) Added "rewrite" method to the parser:
private void ParseAs(CustomToken start, string text)
{
    start.ParseAs(text);

    var stop = (CustomToken)input.LT(-1);
    for (int i = start.TokenIndex + 1; i <= stop.TokenIndex; ++i)
    {
        var token = (CustomToken)input.Get(i);
        token.ParseAs("");
    }
}

4) Set grammar option:
TokenLabelType = CustomToken;


Usage:

assignment
	:	ID '=' INT	{ ParseAs($assignment.start,
"<assignment>"); }
	;



-----Original Message-----
From: Lorenzo de Lara [mailto:ldelara at affsys.com] 
Sent: Tuesday, May 11, 2010 4:35 PM
To: Chrobot, Stefan
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] ANTLR performance

I have noticed the same thing with rewrite=true and came upon this bug
report from 2008, which is currently still open:

http://www.antlr.org/jira/browse/ANTLR-371

The problem is parsers with rewrite rules run in non-linear time on any
inputs above a few hundred rewrites. I've verified this in both Java and
C#. You can verify this for yourself by commenting out your rewrite
rules and running the parser and observing much closer to linear
runtime. (5 minutes with rewrite rules on vs. 5 seconds rewrite rules
off on a typical 1500 line input for us) The offending method is
GetKindOfOps in TokenRewriteStream taking up to 100% of the runtime
according to a Java profiling tool.

I've implemented the proposed fix (in Java) which does away with calling
GetKindOfOps completely and can confirm it does result in much more
reasonable, linear-like performance, without introducing any new
problems, as far as I can tell.

-Lorenzo

On 2010-05-11, at 5:17 , Chrobot, Stefan wrote:

Hi,



I'm using ANTLR with the C# target. The generated parser performs too
slow for my needs. My grammar uses k = 6.

Does it have a performance impact? What value should I target to get
optimum performance - 1 or *? Would changing the grammar to 1/* give
significant performance boost?





Stefan


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list