[antlr-interest] Why is this ambiguous?
Martin C. Martin
martin at martincmartin.com
Thu Jan 4 17:24:03 PST 2007
First of all, thanks for Antlr, it's a huge help!
But I don't understand why the following dead-simple lexer is ambiguous:
class MyLexer extends Lexer;
IDENTIFIER: "+" ;
INT : ('+' | '-')? ( '0'..'9' )+ ;
An INT must contain at least one digit, and an IDENTIFIER no digits. So
if I receive a + followed by any non-digit (including end of stream), it
must be an identifier. If I get a + followed by a digit, it must be an
INT. It can't be an IDENTIFIER followed by an INT, because when
deciding what token to use for the +, it must match the longest
sequence, and + followed by digits is longer than just + alone.
Am I missing something? How do I make this non-ambiguous? For the
record, the error message is:
$ java antlr.Tool MyLexer.g
ANTLR Parser Generator Version 2.7.5 (20050128) 1989-2005 jGuru.com
MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
More information about the antlr-interest