[antlr-interest] Lexer code not generated as expected?
frogery at voila.fr
frogery at voila.fr
Tue Dec 15 07:10:33 PST 2009
Hello,
I have found out a strange problem using Antlr and I wonder if it is a bug or not.
Here is part of my grammar:
WS
: ' ' {$channel=HIDDEN;}
;
CUTLINE
: ('\n' ' '* '+') {$channel=HIDDEN;}
;
NEWLINE
: '\n'
;
and here is what antlr generates in the function mTokens:
static void
mTokens(pAntlrTestbenchLexer ctx)
{
{
// antlr/AntlrTestbench.g:1:8: ( T__10 | WS | CUTLINE | NEWLINE | ID | INT )
ANTLR3_UINT32 alt4;
alt4=6;
switch ( LA(1) )
{
...
case '\n':
{
switch ( LA(2) )
{
case ' ':
case '+':
{
alt4=3; //CUTLINE
}
break;
default:
alt4=4;} //NEWLINE
}
break;
...
It doesn't correspond to what I want because when the input of the lexer is "\n ", I would expect it to recognize the lexemes NEWLINE and WS, but with the code above it will try to recognize the lexeme CUTLINE and fail.
Indeed, when a '\n' has been first recognized, the lexer should look ahead to find the first non ' ' character, and then if it is a '+' character, OK the correct alternative is the CUTLINE rule, if not then only in this case the correct alternative is the NEWLINE rule.
The workarounbd I have found is to change the grammar this way:
NEWLINE
: '\n' ' '*
;
Then it is working as I want, but I find it strange having to resolve the ambiguity this way.
So is the C code generated by antlr correct or is it a bug?
Thanks,
Yann
____________________________________________________
Venez faire le plein d’idées et remplir votre hotte de cadeaux sur http://evenementiel.voila.fr/Noel/
More information about the antlr-interest
mailing list