[antlr-interest] Fwd: Why is this ambiguous?

Fri Jan 5 06:58:00 PST 2007

2007/1/5, Martin C. Martin <martin at martincmartin.com>:
>
> Thanks Jose,
>
> Jose Ventura wrote:
> > Hi Martin,
> >
> > You can see with an example why is ambiguous.
> >
> > With the stream "+1" the lexer can make:
> >
> > - IDENTIFIER(+) INT(1) <-- This solution is possible because the '+' of
> > int is optional.
> > - INT(+1)
>
> Thanks, I mentioned this in my original email.

Opps! Sometimes I don't understand very good the messages. I apologize by my
english.

It's also true that the
> stream "254" is ambiguous:

- INT(254)
> - INT(25) INT(4)
> - INT(2) INT(54)
> - INT(2) INT(5) INT(4)
>
> The reason this isn't considered ambiguous is because it matches the
> longest possible string.

Is the "longest match" rule only used for choosing what to assign to a
> single token, and not to choose between tokens or something?

I think no, every rule is independent from others.

> There're two solutions.
> >
> >
> > Maybe, you can try:
> >
> > INT_IDENTIFIER
> >     : '+' {$setType(IDENTIFIER);} ( ('0'..'9')+ {$setType(INT);}
> >                                                | ('a'..'z')*
> >                                                )
> > ;
> >
> > INT: ('-')? ('0'..'9')+ ;
>
> Thanks, perhaps I'll give that a go.
>
> - Martin
>
> >
> > I think this run ok, but you must check it.
> >
> > Regards,
> > José Ventura
> >
> > ---------- Forwarded message ----------
> > From: *Martin C. Martin* <martin at martincmartin.com
> > <mailto:martin at martincmartin.com>>
> > Date: 05-ene-2007 2:24
> > Subject: [antlr-interest] Why is this ambiguous?
> > To: antlr-interest at antlr.org <mailto:antlr-interest at antlr.org>
> >
> > Hi,
> >
> > First of all, thanks for Antlr, it's a huge help!
> >
> > But I don't understand why the following dead-simple lexer is ambiguous:
> >
> > class MyLexer extends Lexer;
> >
> > options {
> >    k=4;
> > }
> >
> > IDENTIFIER: "+" ;
> >
> > INT : ('+' | '-')? ( '0'..'9' )+ ;
> >
> > An INT must contain at least one digit, and an IDENTIFIER no digits.  So
> > if I receive a + followed by any non-digit (including end of stream), it
> > must be an identifier.  If I get a + followed by a digit, it must be an
> > INT.  It can't be an IDENTIFIER followed by an INT, because when
> > deciding what token to use for the +, it must match the longest
> > sequence, and + followed by digits is longer than just + alone.
> >
> > Am I missing something?  How do I make this non-ambiguous?  For the
> > record, the error message is:
> >
> > $ java antlr.Tool MyLexer.g
> > ANTLR Parser Generator   Version 2.7.5 (20050128)   1989-2005 jGuru.com
> > MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
> > INT upon
> > MyLexer.g:     k==1:'+'
> > MyLexer.g:     k==2:<end-of-token>
> > MyLexer.g:     k==3:<end-of-token>
> > MyLexer.g:     k==4:<end-of-token>
> >
> > Best,
> > Martin
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070105/8af8f460/attachment-0001.html