[antlr-interest] Problem when parsing numerics

Wed Feb 18 08:34:54 PST 2009

Thomas Woelfle wrote:
> Hi Indhu,
>
> thanks for your reply. You are right. The lexer tries to find the 
> longest valid next token. But given my sample grammer and the sample 
> input '1.' the first valid token is '1' which is a NUMERIC and then the 
> next token is '.'. It is correct that the NUMERIC rule cannot match '1.' 
> since that is not a valid NUMERIC token. What it should match is '1' 
> which is a valid NUMERIC token.
>
> What I don't understand is why the lexer assumes that if there is a '.' 
> after some DIGITs it has to be a NUMERIC.
>
> foo     :     NUMERIC '.';
>
> NUMERIC :    '0'..'9'+ ('.' '0'..'9'+)?;
>
>
> The NUMERIC rules defines that after the initial DIGITS there may be a 
> '.' followed by at least one DIGIT. Therefore the lexer prediction that 
> a NUMERIC is the next token if a '.' has been recognized after some 
> DIGITS isn't correct, isn't it?
>
> Any ideas?
>   
You have to tell it what to do to verify its selection. the '.' tells it 
to look for 0..9 and that fails. Then you have auto-generated a lexer 
rule for '.' and made it all ambiguous ;-). Rule number one if you are 
not yet very familiar with ANTLR is to NOT put 'literals' in your 
parser. It tempts you to think that the lexer is being driven by the 
parser, but the lexer runs all the way through the input first.

For your simple rule, you can have:

foo : NUMERIC DOT;

NUMERIC  : ('0'..'9')+ ( ('.' '0'..'9')=> '.' ('0'..'9')+) ;
DOT : '.' ;

But that precludes:

5.

from being a floating point number of course.

Jim