[antlr-interest] Non-deterministic behaviour in matching lexer tokens

Kirby Bohling kirby.bohling at gmail.com
Fri May 27 15:23:49 PDT 2011


First grammar:
> VALUE:
>    (LETTER | DIGIT)+
>    ;

Second Grammar:
> VALUE:
>    (LETTER | DIGIT) '!'+
>    ;
> action MYVAL!   (MismatchedTokenException: line 3:7 mismatched input 'MYVAL'

You've got the rule in + in the wrong place.  I'm pretty sure you meant:

VALUE:
   (LETTER | DIGIT)+ '!'
;

It is blowing up at the 'Y', because it can have one letter or one
digit, and at least '!'.  You've given it 5 letters then one '!'.

While you can make this work, it would likely be easier to make the
difference between those to easier to disambiguate.  However, if you
think this is the correct approach read the FAQ about floats vs.
ranges:
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs

That's got the example of all of the power tools for how to man handle
ambiguous tokens types.

Kirby


More information about the antlr-interest mailing list