[antlr-interest] Lexor Alternative lose (or Newbie question)

Elden Crom eldencrom at comcast.net
Mon Jul 4 09:46:55 PDT 2005


In running the lexor definition:

class ExprLexer extends Lexer;
options {
k=3;
charVocabulary='\u0000'..'\u007F'; // allow ascii
}
//STRING_LITERAL :'"' (ESC|~('"'|'\\'|'_'))* '"' ;
STRING_LITERAL :'"' (ESC|~('"'|'\\'))* '"' ;


NUMBER :(DECNUM|RADIXNUM);

protected ESC : '\\' NUMBER '~';
protected ALPHA : ('A'..'F');
protected NUM : ('0'..'9');
protected RADIXNUM: '0' (ALPHA|NUM) '_' (ALPHA|NUM)+ ('.' (ALPHA|NUM))? 
('#' (ALPHA|NUM))?;
protected DECNUM: (NUM)+ ('.' (NUM))? ('#' (NUM))?;

This lexor complains that:
ANTLR Parser Generator Version 2.7.5 (20050201) 1989-2005 jGuru.com
Generating ExprLexer.txt
s_t.g:10:20: warning:lexical nondeterminism between alts 1 and 2 of 
block upon
s_t.g:10:20: k==1:'0'
s_t.g:10:20: k==2:'0'..'9'
s_t.g:10:20: k==3:'_'
Generating ExprLexerTokenTypes.txt
Generating ExprLexerTokenTypes.txt

(line 10 is the ‘NUMBER:’ rule)
Changing STRING_LITERAL to the commented version removes the warning.
But, I believe that the original should not have created the warning.

In looking at ‘ExprLexer.txt’ (the result of ‘antlr-2.7.5.exe 
-diagnostic s_t.g’), I see that under ‘*** Lexer Rule: mNUMBER’ k==3 
contains ‘_’ for both ‘Rule Reference: mDECNUM’ and ‘Rule Reference: 
mRADIXNUM’, causing the nondeterminism.

I believe that mDECNUM’s Alternate(1) should have been split into 2 
different look aheads. One that took k==2 ‘~’ and stopped and another 
that should have k==2 as “'#', '.', NUM” and k==3 as “ALPHA|NUM”.

Am I a newbie, or is this an antlr error?




More information about the antlr-interest mailing list