[antlr-interest] Fwd: Why is this ambiguous?
Jose Ventura
jose.ventura.roda at gmail.com
Fri Jan 5 02:08:12 PST 2007
Hi Martin,
You can see with an example why is ambiguous.
With the stream "+1" the lexer can make:
- IDENTIFIER(+) INT(1) <-- This solution is possible because the '+' of int
is optional.
- INT(+1)
There're two solutions.
Maybe, you can try:
INT_IDENTIFIER
: '+' {$setType(IDENTIFIER);} ( ('0'..'9')+ {$setType(INT);}
| ('a'..'z')*
)
;
INT: ('-')? ('0'..'9')+ ;
I think this run ok, but you must check it.
Regards,
José Ventura
---------- Forwarded message ----------
From: Martin C. Martin <martin at martincmartin.com>
Date: 05-ene-2007 2:24
Subject: [antlr-interest] Why is this ambiguous?
To: antlr-interest at antlr.org
Hi,
First of all, thanks for Antlr, it's a huge help!
But I don't understand why the following dead-simple lexer is ambiguous:
class MyLexer extends Lexer;
options {
k=4;
}
IDENTIFIER: "+" ;
INT : ('+' | '-')? ( '0'..'9' )+ ;
An INT must contain at least one digit, and an IDENTIFIER no digits. So
if I receive a + followed by any non-digit (including end of stream), it
must be an identifier. If I get a + followed by a digit, it must be an
INT. It can't be an IDENTIFIER followed by an INT, because when
deciding what token to use for the +, it must match the longest
sequence, and + followed by digits is longer than just + alone.
Am I missing something? How do I make this non-ambiguous? For the
record, the error message is:
$ java antlr.Tool MyLexer.g
ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
INT upon
MyLexer.g: k==1:'+'
MyLexer.g: k==2:<end-of-token>
MyLexer.g: k==3:<end-of-token>
MyLexer.g: k==4:<end-of-token>
Best,
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070105/edc7d7a9/attachment.html
More information about the antlr-interest
mailing list