[antlr-interest] Why is this ambiguous?

Martin C. Martin martin at martincmartin.com
Thu Jan 4 17:24:03 PST 2007


Hi,

First of all, thanks for Antlr, it's a huge help!

But I don't understand why the following dead-simple lexer is ambiguous:

class MyLexer extends Lexer;

options {
    k=4;
}

IDENTIFIER: "+" ;

INT : ('+' | '-')? ( '0'..'9' )+ ;

An INT must contain at least one digit, and an IDENTIFIER no digits.  So 
if I receive a + followed by any non-digit (including end of stream), it 
must be an identifier.  If I get a + followed by a digit, it must be an 
INT.  It can't be an IDENTIFIER followed by an INT, because when 
deciding what token to use for the +, it must match the longest 
sequence, and + followed by digits is longer than just + alone.

Am I missing something?  How do I make this non-ambiguous?  For the 
record, the error message is:

$ java antlr.Tool MyLexer.g
ANTLR Parser Generator   Version 2.7.5 (20050128)   1989-2005 jGuru.com
MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and 
INT upon
MyLexer.g:     k==1:'+'
MyLexer.g:     k==2:<end-of-token>
MyLexer.g:     k==3:<end-of-token>
MyLexer.g:     k==4:<end-of-token>

Best,
Martin



More information about the antlr-interest mailing list