[antlr-interest] Lexer fails
Peter Piper
peterpiper797 at hotmail.com
Fri Jan 27 06:33:37 PST 2012
Thank you for the reply and the suggestions, especially your alternative pattern with
the $type lexer action - I hadn't thought of that!
Nonetheless, I'm a bit worried that the Lexer is just a bit too buggy. How confident
are you that this is caused by the lack of sufficient lookahead? I don't see that the
patterns can be ambiguous on that input, even with LA(1). There is no 'e', so it
cannot match the supplied definition of FLOAT.
At the very least, I would expect a "can't backtrack" error when it gets to the
character following the end of the number (';' in this case). However, the lexer
seems quite happy to declare a match where there isn't one!
Dan
> Date: Fri, 27 Jan 2012 20:25:29 +1300
> To: peterpiper797 at hotmail.com; antlr-interest at antlr.org
> From: antlr at mirality.co.nz
> Subject: Re: [antlr-interest] Lexer fails
>
> At 14:27 27/01/2012, Peter Piper wrote:
> >I'm sorry that I can only talk about the old stuff (v3) but can
> >anyone see how the following lexer token definition:
> >
> >FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') ('-')?
> >('0'..'9')+ ;
> [...]
> >
> >There is no 'e' or 'E' in the input, so why does the ANTLR lexer
>
> >think that this is a "better" token to output than the other one
>
> >I want it to go for, namely:
> >
> >FIXED : ('0'..'9')+ '.' ('0'..'9')* ;
>
> v3 lexers mostly just use single-char lookahead when around
> looping constructs, which isn't sufficient to disambiguate these
> cases. You need to help it out a bit by providing explicit
> lookahead hints. (Reportedly v4 is infinitely better at this, but
> I haven't tried it myself yet.)
>
> fragment FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e')
> ('-')? ('0'..'9')+;
>
> FIXED : (FLOAT) => FLOAT { $type = FLOAT; }
> | ('0'..'9')+ '.' ('0'..'9')*
> ;
>
> Or left-factor it for more efficiency (and throw an INTEGER in,
> since I assume you have one of those too):
>
> fragment FLOAT : ;
> fragment FIXED : ;
>
> INTEGER : ('0'..'9')+
> ( ('.' ('0'..'9')) => '.' ('0'..'9')* { $type = FIXED; }
> ( ('E'|'e') '-'? ('0'..'9')+ { $type = FLOAT; } )? )?
> ;
>
> Or just call all of these things NUMBERs and sort it out in the
> parser. :)
>
More information about the antlr-interest
mailing list