[antlr-interest] Lexer code not generated as expected?
Jim Idle
jimi at temporal-wave.com
Tue Dec 15 09:25:17 PST 2009
Your rules are ambiguous so ANTLR is finding a \n but if followed by a space or a '+' then it is recognizing CUTLINE. The analysis only looks ahead 'enough' to start down the path (it is not a try to match in order system like flex.) You have to be more specific with the lexer here if you want that kind of behavior:
fragment NEWLINE : ;
CUTLINE
: '\n'
(
(' '* '+')=>' '* '+') { skip(); }
| {$type = NEWLINE}
)
;
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of frogery at voila.fr
> Sent: Tuesday, December 15, 2009 7:11 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Lexer code not generated as expected?
>
> Hello,
>
> I have found out a strange problem using Antlr and I wonder if it is a
> bug or not.
> Here is part of my grammar:
>
> WS
> : ' ' {$channel=HIDDEN;}
> ;
>
> CUTLINE
> : ('\n' ' '* '+') {$channel=HIDDEN;}
> ;
>
> NEWLINE
> : '\n'
> ;
>
> and here is what antlr generates in the function mTokens:
>
> static void
> mTokens(pAntlrTestbenchLexer ctx)
> {
> {
> // antlr/AntlrTestbench.g:1:8: ( T__10 | WS | CUTLINE |
> NEWLINE | ID | INT )
>
> ANTLR3_UINT32 alt4;
>
> alt4=6;
>
> switch ( LA(1) )
> {
> ...
> case '\n':
> {
> switch ( LA(2) )
> {
> case ' ':
> case '+':
> {
> alt4=3; //CUTLINE
> }
> break;
>
> default:
> alt4=4;} //NEWLINE
>
> }
> break;
>
> ...
>
>
> It doesn't correspond to what I want because when the input of the
> lexer is "\n ", I would expect it to recognize the lexemes NEWLINE and
> WS, but with the code above it will try to recognize the lexeme CUTLINE
> and fail.
> Indeed, when a '\n' has been first recognized, the lexer should look
> ahead to find the first non ' ' character, and then if it is a '+'
> character, OK the correct alternative is the CUTLINE rule, if not then
> only in this case the correct alternative is the NEWLINE rule.
>
> The workarounbd I have found is to change the grammar this way:
>
> NEWLINE
> : '\n' ' '*
> ;
>
> Then it is working as I want, but I find it strange having to resolve
> the ambiguity this way.
> So is the C code generated by antlr correct or is it a bug?
>
> Thanks,
> Yann
>
> ____________________________________________________
>
> Venez faire le plein d’idées et remplir votre hotte de cadeaux sur
> http://evenementiel.voila.fr/Noel/
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list