[antlr-interest] Fwd: Why is this ambiguous?
Jose Ventura
jose.ventura.roda at gmail.com
Fri Jan 5 06:58:00 PST 2007
2007/1/5, Martin C. Martin <martin at martincmartin.com>:
>
> Thanks Jose,
>
> Jose Ventura wrote:
> > Hi Martin,
> >
> > You can see with an example why is ambiguous.
> >
> > With the stream "+1" the lexer can make:
> >
> > - IDENTIFIER(+) INT(1) <-- This solution is possible because the '+' of
> > int is optional.
> > - INT(+1)
>
> Thanks, I mentioned this in my original email.
Opps! Sometimes I don't understand very good the messages. I apologize by my
english.
It's also true that the
> stream "254" is ambiguous:
- INT(254)
> - INT(25) INT(4)
> - INT(2) INT(54)
> - INT(2) INT(5) INT(4)
>
> The reason this isn't considered ambiguous is because it matches the
> longest possible string.
Is the "longest match" rule only used for choosing what to assign to a
> single token, and not to choose between tokens or something?
I think no, every rule is independent from others.
> There're two solutions.
> >
> >
> > Maybe, you can try:
> >
> > INT_IDENTIFIER
> > : '+' {$setType(IDENTIFIER);} ( ('0'..'9')+ {$setType(INT);}
> > | ('a'..'z')*
> > )
> > ;
> >
> > INT: ('-')? ('0'..'9')+ ;
>
> Thanks, perhaps I'll give that a go.
>
> - Martin
>
> >
> > I think this run ok, but you must check it.
> >
> > Regards,
> > José Ventura
> >
> > ---------- Forwarded message ----------
> > From: *Martin C. Martin* <martin at martincmartin.com
> > <mailto:martin at martincmartin.com>>
> > Date: 05-ene-2007 2:24
> > Subject: [antlr-interest] Why is this ambiguous?
> > To: antlr-interest at antlr.org <mailto:antlr-interest at antlr.org>
> >
> > Hi,
> >
> > First of all, thanks for Antlr, it's a huge help!
> >
> > But I don't understand why the following dead-simple lexer is ambiguous:
> >
> > class MyLexer extends Lexer;
> >
> > options {
> > k=4;
> > }
> >
> > IDENTIFIER: "+" ;
> >
> > INT : ('+' | '-')? ( '0'..'9' )+ ;
> >
> > An INT must contain at least one digit, and an IDENTIFIER no digits. So
> > if I receive a + followed by any non-digit (including end of stream), it
> > must be an identifier. If I get a + followed by a digit, it must be an
> > INT. It can't be an IDENTIFIER followed by an INT, because when
> > deciding what token to use for the +, it must match the longest
> > sequence, and + followed by digits is longer than just + alone.
> >
> > Am I missing something? How do I make this non-ambiguous? For the
> > record, the error message is:
> >
> > $ java antlr.Tool MyLexer.g
> > ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
> > MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
> > INT upon
> > MyLexer.g: k==1:'+'
> > MyLexer.g: k==2:<end-of-token>
> > MyLexer.g: k==3:<end-of-token>
> > MyLexer.g: k==4:<end-of-token>
> >
> > Best,
> > Martin
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070105/8af8f460/attachment-0001.html
More information about the antlr-interest
mailing list