[antlr-interest] Lexer fails

Jim Idle jimi at temporal-wave.com
Fri Jan 27 07:06:08 PST 2012


I bet that it is saying 'unexpected char - ignored' - are you using the
ANTLRWorks debugger?

Jm

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Peter Piper
> Sent: Friday, January 27, 2012 6:34 AM
> To: antlr at mirality.co.nz; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer fails
>
>
> Thank you for the reply and the suggestions, especially your
> alternative pattern with the $type lexer action - I hadn't thought of
> that!
>
> Nonetheless, I'm a bit worried that the Lexer is just a bit too buggy.
> How confident are you that this is caused by the lack of sufficient
> lookahead? I don't see that the patterns can be ambiguous on that
> input, even with LA(1). There is no 'e', so it cannot match the
> supplied definition of FLOAT.
>
> At the very least, I would expect a "can't backtrack" error when it
> gets to the character following the end of the number (';' in this
> case). However, the lexer seems quite happy to declare a match where
> there isn't one!
>
> Dan
>
> > Date: Fri, 27 Jan 2012 20:25:29 +1300
> > To: peterpiper797 at hotmail.com; antlr-interest at antlr.org
> > From: antlr at mirality.co.nz
> > Subject: Re: [antlr-interest] Lexer fails
> >
> > At 14:27 27/01/2012, Peter Piper wrote:
> >  >I'm sorry that I can only talk about the old stuff (v3) but can
> > >anyone see how the following lexer token definition:
> >  >
> >  >FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') ('-')?
> >  >('0'..'9')+ ;
> > [...]
> >  >
> >  >There is no 'e' or 'E' in the input, so why does the ANTLR lexer
> >
> >  >think that this is a "better" token to output than the other one
> >
> >  >I want it to go for, namely:
> >  >
> >  >FIXED : ('0'..'9')+ '.' ('0'..'9')* ;
> >
> > v3 lexers mostly just use single-char lookahead when around looping
> > constructs, which isn't sufficient to disambiguate these cases.  You
> > need to help it out a bit by providing explicit lookahead hints.
> > (Reportedly v4 is infinitely better at this, but I haven't tried it
> > myself yet.)
> >
> > fragment FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') ('-')?
> > ('0'..'9')+;
> >
> > FIXED : (FLOAT) => FLOAT { $type = FLOAT; }
> >        | ('0'..'9')+ '.' ('0'..'9')*
> >        ;
> >
> > Or left-factor it for more efficiency (and throw an INTEGER in, since
> > I assume you have one of those too):
> >
> > fragment FLOAT : ;
> > fragment FIXED : ;
> >
> > INTEGER : ('0'..'9')+
> >          ( ('.' ('0'..'9')) => '.' ('0'..'9')* { $type = FIXED; }
> >          ( ('E'|'e') '-'? ('0'..'9')+ { $type = FLOAT; } )? )?
> >          ;
> >
> > Or just call all of these things NUMBERs and sort it out in the
> > parser. :)
> >
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list