[antlr-interest] Non-deterministic behaviour in matching lexer tokens

Anthony Bargnesi abargnesi at gmail.com
Fri May 27 15:30:04 PDT 2011


Thanks for the quick reply!

My second grammar was a mistake, sorry.  I realize that '!'+ does a good job
of disambiguating
VALUE from IDENT.

But if I change that second grammar too:

call:
    'call' id=IDENT
    ;

action:
    'action' VALUE
    ;

IDENT:
    LETTER (LETTER | DIGIT | '_')*
    ;

VALUE:
    (LETTER | DIGIT)+
    ;

fragment LETTER:
    ('a'..'z' | 'A'..'Z')
    ;

fragment DIGIT:
    '0'..'9'
    ;

WS:
    (' ' | '\t' | '\n' | '\r'| '\f')+
    {$channel = HIDDEN;}
    ;

Then I parse "action myval" and receive this error:

line 1:7 mismatched input 'myval' expecting VALUE

Because the lexer cannot determine whether the token is IDENT or VALUE
my action rule will fail.

What are my options for disambiguation at this point?

-tony


On Fri, May 27, 2011 at 6:23 PM, Kirby Bohling <kirby.bohling at gmail.com>wrote:

> First grammar:
> > VALUE:
> >    (LETTER | DIGIT)+
> >    ;
>
> Second Grammar:
> > VALUE:
> >    (LETTER | DIGIT) '!'+
> >    ;
> > action MYVAL!   (MismatchedTokenException: line 3:7 mismatched input
> 'MYVAL'
>
> You've got the rule in + in the wrong place.  I'm pretty sure you meant:
>
> VALUE:
>   (LETTER | DIGIT)+ '!'
> ;
>
> It is blowing up at the 'Y', because it can have one letter or one
> digit, and at least '!'.  You've given it 5 letters then one '!'.
>
> While you can make this work, it would likely be easier to make the
> difference between those to easier to disambiguate.  However, if you
> think this is the correct approach read the FAQ about floats vs.
> ranges:
>
> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs
>
> That's got the example of all of the power tools for how to man handle
> ambiguous tokens types.
>
> Kirby
>


More information about the antlr-interest mailing list