[antlr-interest] Bug in DFA matching?
C. Scott Ananian
cscott at cscott.net
Mon Feb 9 11:55:45 PST 2009
I have a grammar for a configuration file where indentation is
significant, as in Python. It contains the following lexer rules:
WS
: {getCharPositionInLine()!=1}? // not start-of-line whitespace
( ' ' | TAB )
{ $channel=HIDDEN; }
;
// whitespace at start of line used for INDENT processing
INITIAL_WS
: {getCharPositionInLine()==1 && !afterIndent}? // at start of line.
( ' ' | TAB )*
{ this.afterIndent=true; }
;
Note the star in the INITIAL_WS rule, which means that *every* line
should emit an INITIAL_WS token, possibly matching nothing, before
matching anything else.
The generated DFA contains the following code:
case 0 :
int LA10_25 = input.LA(1);
int index10_25 = input.index();
input.rewind();
s = -1;
if ( ((getCharPositionInLine()!=1)) ) {s = 26;}
else if ( ((getCharPositionInLine()==1 &&
!afterIndent)) ) {s = 6;}
input.seek(index10_25);
if ( s>=0 ) return s;
break;
which seems to be "obviously wrong" -- getCharPosition is going to be
evaluated in the rewound state, and then we're going to advance the
input and return, which will then invoke the proper lexer rule and
re-evaluate getCharPostion() in the *advanced* state, not where the
DFA evaluated it.
I don't quite understand the DFA well enough yet to attempt a proper
fix. Anyone want to lend a hand?
Thanks--
--scott
--
( http://cscott.net/ )
More information about the antlr-interest
mailing list