[antlr-interest] Antlr won't backtrack?

Vitaliy Vitaliy at dbsophic.com
Sun Feb 8 02:48:53 PST 2009


Hi all,

I've encountered a very interesting case in which it seems to me that Antlr simply won't backtrack far enough.
The (very) simplified test case of this issue can be demonstrated by the following grammar:

grammar TestCase;

options
{
                language = CSharp;
                backtrack=true;
                memoize=true;
}

tokens
{
    KEYWORD = 'keyword';
}

@parser::members
{
        protected override void Mismatch(IIntStream input, int ttype, BitSet follow)
        {
            throw new MismatchedTokenException(ttype, input);
        }
}
@parser::rulecatch
{
                        catch (RecognitionException recognitionException)
                        {
                                                throw recognitionException;
                        }
}

COLON                   : ':' ;
Whitespace           : (' ' | '\t' | '\n' | '\r')  {$channel=HIDDEN;};
Identifier               : ('a'..'z')+ ;

root                        : statements EOF;
statements            : (statement)+;
statement             : label
                                  | keyword
                                  | variable
                                  ;

identifier               : Identifier;
label                       : identifier COLON;
keyword                : KEYWORD (identifier)?;
variable                 : identifier;

For the input:
keyword label:
Antlr will throw a MismatchedTokenException.
Moreover, if the variable alternative in the statement rule (which seems irrelevant) is removed from the rule, everything works perfect.

After some debugging I found out that:

*         In the first case with the variable alternative,
Antlr generates a synpred in the keyword rule, which tries to consume an identifier.
Upon success of the synpred, the keyword rule indeed consumes the input 'keyword label', including the 'label' identifier - which is too early,
and fails to find anything which would match the ':'.
At this point, instead of backtracking, Antlr throws the MismatchedTokenException exception.

*         In the second case, without the variable alternative,
Antlr does not generate a synpred.
Instead it uses a look ahead of 2, to see the ':' token,
matches just the 'keyword' input for the keyword rule, does not consume the 'label' input too early,
and finally successfully matches the 'label:' input for the label rule, and the end of the input for the root rule.

Also, please note that I'm using the backtrack=true; option which should, by my understanding, handle the backtracking correctly.
So, what am I missing??

Thanks so much for any clue,
Vitaliy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090208/39968c4b/attachment.html 


More information about the antlr-interest mailing list