[antlr-interest] Lexer Problem - ANTLR bug or my stupidity?
Michael Gerz
michael.gerz at teststep.org
Mon Apr 16 11:42:44 PDT 2007
Terence, all,
could you please have a look at this simple lexer grammar, please? (It
is a stripped-down version of a much more reasonable grammar :-) )
+++++++++++++++++++++++++++++++++++++
grammar Foo;
fragment CHAR :
NON_SPECIAL_CHAR
| OVERRIDER OVERRIDER
;
fragment NON_SPECIAL_CHAR :
'a'
;
CHAR_STRING :
CHAR ( CHAR )*
;
OVERRIDER :
'#'
;
++++++++++++++++++++++++++++++++++++++++
IMHO, for a given input
a##a#a
FooLexer should output three tokens
a##a
#
a
Unfortunately, it does not but raises an error instead. If you look at
the generated lexer code, it becomes clear why:
public final void mCHAR_STRING() throws RecognitionException {
try {
int _type = CHAR_STRING;
// ReplicationTransaction.g:45:2: ( CHAR ( CHAR )* )
// ReplicationTransaction.g:45:2: CHAR ( CHAR )*
{
mCHAR();
// ReplicationTransaction.g:45:7: ( CHAR )*
loop2:
do {
int alt2=2;
int LA2_0 = input.LA(1);
if ( (LA2_0=='#'||LA2_0=='a') ) {
alt2=1;
}
switch (alt2) {
case 1 :
// ReplicationTransaction.g:45:9: CHAR
{
mCHAR();
}
break;
default :
break loop2;
}
} while (true);
}
this.type = _type;
}
finally {
}
}
Once the lexer has entered mCHAR_STRING is uses a lookahead of 1! In
other words, it does not check what's behind the # ! Since ANTLR claims
to use LL(*), this looks like a real bug to me.
In any case, does anybody have a clever workaround?
A thousand thanks in advance!
Michael
More information about the antlr-interest
mailing list