[antlr-interest] Lexer lookahead overoptimizes
shmuel siegel
antlr at shmuelhome.mine.nu
Sat Apr 7 04:56:14 PDT 2007
Among other rules, I have these two lexer rules.
SHIN:
'\u00d7' '\u00a9' ('\u00d7' '\u0081')? ('\u00d7' '\u0082')?;
TUF:
'\u00d7' '\u00aa';
The code produced for "SHIN" properly recognizes that the optional first
parenthesis needs two terms to match but the second parenthesis will try
to match if the first character matches. Therefore I get a recognition
exception from the sequence '\u00d7' '\u00a9' '\u00d7' '\u00aa'.
What I am saying will probably be clearer upon looking at the code
produced for "SHIN". Note that it just checks for '\u00d7' and then
wants to match '\u00d7' '\u0082'.
// $ANTLR start SHIN
public final void mSHIN() throws RecognitionException {
try {
int _type = SHIN;
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:2: (
'\\u00d7' '\\u00a9' ( '\\u00d7' '\\u0081' )? ( '\\u00d7' '\\u0082' )? )
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:2:
'\\u00d7' '\\u00a9' ( '\\u00d7' '\\u0081' )? ( '\\u00d7' '\\u0082' )?
{
match('\u00D7');
match('\u00A9');
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:20: (
'\\u00d7' '\\u0081' )?
int alt9=2;
int LA9_0 = input.LA(1);
if ( (LA9_0=='\u00D7') ) {
int LA9_1 = input.LA(2);
if ( (LA9_1=='\u0081') ) {
alt9=1;
}
}
switch (alt9) {
case 1 :
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:21:
'\\u00d7' '\\u0081'
{
match('\u00D7');
match('\u0081');
}
break;
}
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:41: (
'\\u00d7' '\\u0082' )?
int alt10=2;
int LA10_0 = input.LA(1);
if ( (LA10_0=='\u00D7') ) {
alt10=1;
}
switch (alt10) {
case 1 :
//
E:\\downloads\\Eclipse\\learning\\Tamei\\grammar\\Miqroh.g:154:42:
'\\u00d7' '\\u0082'
{
match('\u00D7');
match('\u0082');
}
break;
}
}
this.type = _type;
}
finally {
}
}
// $ANTLR end SHIN
More information about the antlr-interest
mailing list