[antlr-interest] Lexer fails

Peter Piper peterpiper797 at hotmail.com
Fri Jan 27 06:33:37 PST 2012


Thank you for the reply and the suggestions, especially your alternative pattern with
the $type lexer action - I hadn't thought of that!

Nonetheless, I'm a bit worried that the Lexer is just a bit too buggy. How confident
are you that this is caused by the lack of sufficient lookahead? I don't see that the
patterns can be ambiguous on that input, even with LA(1). There is no 'e', so it
cannot match the supplied definition of FLOAT.

At the very least, I would expect a "can't backtrack" error when it gets to the
character following the end of the number (';' in this case). However, the lexer
seems quite happy to declare a match where there isn't one!

Dan

> Date: Fri, 27 Jan 2012 20:25:29 +1300
> To: peterpiper797 at hotmail.com; antlr-interest at antlr.org
> From: antlr at mirality.co.nz
> Subject: Re: [antlr-interest] Lexer fails
> 
> At 14:27 27/01/2012, Peter Piper wrote:
>  >I'm sorry that I can only talk about the old stuff (v3) but can
>  >anyone see how the following lexer token definition:
>  >
>  >FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') ('-')?
>  >('0'..'9')+ ;
> [...]
>  >
>  >There is no 'e' or 'E' in the input, so why does the ANTLR lexer 
> 
>  >think that this is a "better" token to output than the other one 
> 
>  >I want it to go for, namely:
>  >
>  >FIXED : ('0'..'9')+ '.' ('0'..'9')* ;
> 
> v3 lexers mostly just use single-char lookahead when around 
> looping constructs, which isn't sufficient to disambiguate these 
> cases.  You need to help it out a bit by providing explicit 
> lookahead hints.  (Reportedly v4 is infinitely better at this, but 
> I haven't tried it myself yet.)
> 
> fragment FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') 
> ('-')? ('0'..'9')+;
> 
> FIXED : (FLOAT) => FLOAT { $type = FLOAT; }
>        | ('0'..'9')+ '.' ('0'..'9')*
>        ;
> 
> Or left-factor it for more efficiency (and throw an INTEGER in, 
> since I assume you have one of those too):
> 
> fragment FLOAT : ;
> fragment FIXED : ;
> 
> INTEGER : ('0'..'9')+
>          ( ('.' ('0'..'9')) => '.' ('0'..'9')* { $type = FIXED; }
>          ( ('E'|'e') '-'? ('0'..'9')+ { $type = FLOAT; } )? )?
>          ;
> 
> Or just call all of these things NUMBERs and sort it out in the 
> parser. :)
> 
 		 	   		  


More information about the antlr-interest mailing list