[antlr-interest] Fwd: Why is this ambiguous?
Martin C. Martin
martin at martincmartin.com
Fri Jan 5 06:42:30 PST 2007
Thanks Jose,
Jose Ventura wrote:
> Hi Martin,
>
> You can see with an example why is ambiguous.
>
> With the stream "+1" the lexer can make:
>
> - IDENTIFIER(+) INT(1) <-- This solution is possible because the '+' of
> int is optional.
> - INT(+1)
Thanks, I mentioned this in my original email. It's also true that the
stream "254" is ambiguous:
- INT(254)
- INT(25) INT(4)
- INT(2) INT(54)
- INT(2) INT(5) INT(4)
The reason this isn't considered ambiguous is because it matches the
longest possible string.
Is the "longest match" rule only used for choosing what to assign to a
single token, and not to choose between tokens or something?
> There're two solutions.
>
>
> Maybe, you can try:
>
> INT_IDENTIFIER
> : '+' {$setType(IDENTIFIER);} ( ('0'..'9')+ {$setType(INT);}
> | ('a'..'z')*
> )
> ;
>
> INT: ('-')? ('0'..'9')+ ;
Thanks, perhaps I'll give that a go.
- Martin
>
> I think this run ok, but you must check it.
>
> Regards,
> José Ventura
>
> ---------- Forwarded message ----------
> From: *Martin C. Martin* <martin at martincmartin.com
> <mailto:martin at martincmartin.com>>
> Date: 05-ene-2007 2:24
> Subject: [antlr-interest] Why is this ambiguous?
> To: antlr-interest at antlr.org <mailto:antlr-interest at antlr.org>
>
> Hi,
>
> First of all, thanks for Antlr, it's a huge help!
>
> But I don't understand why the following dead-simple lexer is ambiguous:
>
> class MyLexer extends Lexer;
>
> options {
> k=4;
> }
>
> IDENTIFIER: "+" ;
>
> INT : ('+' | '-')? ( '0'..'9' )+ ;
>
> An INT must contain at least one digit, and an IDENTIFIER no digits. So
> if I receive a + followed by any non-digit (including end of stream), it
> must be an identifier. If I get a + followed by a digit, it must be an
> INT. It can't be an IDENTIFIER followed by an INT, because when
> deciding what token to use for the +, it must match the longest
> sequence, and + followed by digits is longer than just + alone.
>
> Am I missing something? How do I make this non-ambiguous? For the
> record, the error message is:
>
> $ java antlr.Tool MyLexer.g
> ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
> MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
> INT upon
> MyLexer.g: k==1:'+'
> MyLexer.g: k==2:<end-of-token>
> MyLexer.g: k==3:<end-of-token>
> MyLexer.g: k==4:<end-of-token>
>
> Best,
> Martin
>
>
More information about the antlr-interest
mailing list