[antlr-interest] Why does antlr not know alternative?

Gavin Lambert antlr at mirality.co.nz
Mon Jan 9 16:53:35 PST 2012


At 11:46 10/01/2012, James Ladd wrote:
 >^ 12.
 >
 >I can see that NUMBER has a component of it that can be a '.' 
hence
 >the grammar issue.
 >Is it the start of the decimal part of a number or the end of a
 >statement.

Yes, that's probably the problem.  While generating a NUMBER token 
it's consuming the dot (and then probably suffering a sync error).

 >NUMBER:        ((NUMBER_LEFT)? ('-')? DIGITS (NUMBER_RIGHT_P1)?
 >(NUMBER_RIGHT_P2)?);
[...]
 >fragment NUMBER_LEFT:        DIGITS 'r';
 >fragment NUMBER_RIGHT_P1:    '.' DIGITS;
 >fragment NUMBER_RIGHT_P2:    'e' ('-')? DIGITS;
 >fragment DIGIT:        '0'..'9';
 >fragment DIGITS:        DIGIT+;

Unfortunately v3 lexers are a little too optimistic when faced 
with subrules and */+ sequences -- they tend to only use one-char 
lookahead when they should be using more.  ie. in this case, the 
lexer is deciding between taking the NUMBER_RIGHT_P1 branch or not 
depending on whether the next character is a dot or a not-dot; it 
doesn't look one further ahead to ensure that there's a digit 
afterwards as well, it'll just throw an error while trying to 
match the DIGITS subrule.

You should be able to force the issue by making it explicit:

NUMBER: ((NUMBER_LEFT)? ('-')? DIGITS (('.' DIGIT) => 
NUMBER_RIGHT_P1)? (NUMBER_RIGHT_P2)?);



More information about the antlr-interest mailing list