[antlr-interest] Lexor Alternative lose (or Newbie question)
Elden Crom
eldencrom at comcast.net
Mon Jul 4 09:46:55 PDT 2005
In running the lexor definition:
class ExprLexer extends Lexer;
options {
k=3;
charVocabulary='\u0000'..'\u007F'; // allow ascii
}
//STRING_LITERAL :'"' (ESC|~('"'|'\\'|'_'))* '"' ;
STRING_LITERAL :'"' (ESC|~('"'|'\\'))* '"' ;
NUMBER :(DECNUM|RADIXNUM);
protected ESC : '\\' NUMBER '~';
protected ALPHA : ('A'..'F');
protected NUM : ('0'..'9');
protected RADIXNUM: '0' (ALPHA|NUM) '_' (ALPHA|NUM)+ ('.' (ALPHA|NUM))?
('#' (ALPHA|NUM))?;
protected DECNUM: (NUM)+ ('.' (NUM))? ('#' (NUM))?;
This lexor complains that:
ANTLR Parser Generator Version 2.7.5 (20050201) 1989-2005 jGuru.com
Generating ExprLexer.txt
s_t.g:10:20: warning:lexical nondeterminism between alts 1 and 2 of
block upon
s_t.g:10:20: k==1:'0'
s_t.g:10:20: k==2:'0'..'9'
s_t.g:10:20: k==3:'_'
Generating ExprLexerTokenTypes.txt
Generating ExprLexerTokenTypes.txt
(line 10 is the ‘NUMBER:’ rule)
Changing STRING_LITERAL to the commented version removes the warning.
But, I believe that the original should not have created the warning.
In looking at ‘ExprLexer.txt’ (the result of ‘antlr-2.7.5.exe
-diagnostic s_t.g’), I see that under ‘*** Lexer Rule: mNUMBER’ k==3
contains ‘_’ for both ‘Rule Reference: mDECNUM’ and ‘Rule Reference:
mRADIXNUM’, causing the nondeterminism.
I believe that mDECNUM’s Alternate(1) should have been split into 2
different look aheads. One that took k==2 ‘~’ and stopped and another
that should have k==2 as “'#', '.', NUM” and k==3 as “ALPHA|NUM”.
Am I a newbie, or is this an antlr error?
More information about the antlr-interest
mailing list