[antlr-interest] Conceptual problem with look ahead

Davin McCall davmac at deakin.edu.au
Thu Sep 29 16:59:35 PDT 2005


Hi,

You have a semantic check - "{ LA(2) != '/' }?" - but antlr does not 
take this in to account when checking for determinism. Note that your 
"~( '*' | '\r' | '\n' )" combination allows slashes. So, there is 
non-determinism for the sequence "*/" - does it match "*" followed by 
"/" as part of the in-comment character sequence, or does it match "*/" 
(the comment end)?

I'd suggest to change to:

COMMENTML : "/*"
            ( '*' ~( '/' )
              | ( '\r' ('\n')?)
              | ~( '*'| '\r' | '\n' )
            )*
            "*/"                      {$setType(Token.SKIP);}
            ;

(Haven't tested this, but I think it should work).

Davin


Michael Sielemann wrote:
> Hi everybody,
>
> I mostly finished writing the lexer for my first grammar right now and came
> across something, which points me at a conceptual misunderstanding. I read
> the related FAQ entries but could not figure this out. I would be happy to
> receive any input you guys might have on this.
>
> In my lexer, I have the following rule for multi-line comments in C/C++
> fashion (it's basically the rule from the C grammar at
> http://www.antlr.org/grammar/cgram/grammars/StdCParser.g).
>
>
> COMMENTML : "/*"
>             ( { LA(2) != '/' }? '*'
>               | ( '\r' ('\n')?)
>               | ~( '*'| '\r' | '\n' )
>             )*
>             "*/"                      {$setType(Token.SKIP);}
>             ;
>
>
> If I set k=2 for this lexer, ANTLR tells me that the choices are ambiguous:
>
> ANTLR Parser Generator   Version 2.7.5 (20050201)   1989-2005 jGuru.com
> lexical nondeterminism upon
> k==1:'*'
> k==2:'/'
> between alt 1 and exit branch of block
>
> When I set k to three, everything is fine.
>
> My understanding is that the look-ahead length for lexers exactly
> corresponds to characters. Alternative one checks whether a star would be
> okay by looking at the following character - is it a dash or not. The exit
> branch is "*/" if I am not mistaken. As these two constructs only look at
> the next two characters, I expected everything to be fine. But obviously it
> isn't. Even with the "anything can follow" concept I don't really get this.
> The problem might be that I am not a CS guy but come from engineering, but
> anyway....
>
>
> Thank you very much for your help,
>
> Michael
>
>
> PS: One short second question concerning the lexers. If I want to refer to
> the single quote ' as a single character in ANTLR, is the escape 
> sequence '\'' okay? I think that it is not a classical C escape sequence but
> it seems to be what corresponds best. Thanks.
>
>
>   


-- 
Davin McCall, Research Programmer
Deakin University, Burwood, Australia.
Phone: 03 9251 7045 International: +61 3 9251 7045
Email: Davin.McCall at deakin.edu.au
Website: http://www.deakin.edu.au
Deakin University CRICOS Provider Code 00113B (Vic)



More information about the antlr-interest mailing list