[antlr-interest] Fwd: Why is this ambiguous?

Fri Jan 5 02:08:12 PST 2007

Hi Martin,

You can see with an example why is ambiguous.

With the stream "+1" the lexer can make:

- IDENTIFIER(+) INT(1) <-- This solution is possible because the '+' of int
is optional.
- INT(+1)

There're two solutions.

Maybe, you can try:

INT_IDENTIFIER
    : '+' {$setType(IDENTIFIER);} ( ('0'..'9')+ {$setType(INT);}
                                               | ('a'..'z')*
                                               )
;

INT: ('-')? ('0'..'9')+ ;

I think this run ok, but you must check it.

Regards,
José Ventura

---------- Forwarded message ----------
From: Martin C. Martin <martin at martincmartin.com>
Date: 05-ene-2007 2:24
Subject: [antlr-interest] Why is this ambiguous?
To: antlr-interest at antlr.org

Hi,

First of all, thanks for Antlr, it's a huge help!

But I don't understand why the following dead-simple lexer is ambiguous:

class MyLexer extends Lexer;

options {
   k=4;
}

IDENTIFIER: "+" ;

INT : ('+' | '-')? ( '0'..'9' )+ ;

An INT must contain at least one digit, and an IDENTIFIER no digits.  So
if I receive a + followed by any non-digit (including end of stream), it
must be an identifier.  If I get a + followed by a digit, it must be an
INT.  It can't be an IDENTIFIER followed by an INT, because when
deciding what token to use for the +, it must match the longest
sequence, and + followed by digits is longer than just + alone.

Am I missing something?  How do I make this non-ambiguous?  For the
record, the error message is:

$ java antlr.Tool MyLexer.g
ANTLR Parser Generator   Version 2.7.5 (20050128)   1989-2005 jGuru.com
MyLexer.g: warning:lexical nondeterminism between rules IDENTIFIER and
INT upon
MyLexer.g:     k==1:'+'
MyLexer.g:     k==2:<end-of-token>
MyLexer.g:     k==3:<end-of-token>
MyLexer.g:     k==4:<end-of-token>

Best,
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070105/edc7d7a9/attachment.html