[antlr-interest] Why is this ambiguous?
Martin C. Martin
martin at martincmartin.com
Thu Jan 4 17:24:03 PST 2007
Hi,
First of all, thanks for Antlr, it's a huge help!
But I don't understand why the following dead-simple lexer is ambiguous:
class MyLexer extends Lexer;
options {
k=4;
}
IDENTIFIER: "+" ;
INT : ('+' | '-')? ( '0'..'9' )+ ;
An INT must contain at least one digit, and an IDENTIFIER no digits. So
if I receive a + followed by any non-digit (including end of stream), it
must be an identifier. If I get a + followed by a digit, it must be an
INT. It can't be an IDENTIFIER followed by an INT, because when
deciding what token to use for the +, it must match the longest
sequence, and + followed by digits is longer than just + alone.
Am I missing something? How do I make this non-ambiguous? For the
record, the error message is:
$ java antlr.Tool MyLexer.g
ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
INT upon
MyLexer.g: k==1:'+'
MyLexer.g: k==2:<end-of-token>
MyLexer.g: k==3:<end-of-token>
MyLexer.g: k==4:<end-of-token>
Best,
Martin
More information about the antlr-interest
mailing list