[antlr-interest] Re: Conceptual problem with look ahead --- Skip this email

Michael Sielemann michael.sielemann at asdl.gatech.edu
Thu Sep 29 12:59:17 PDT 2005



Hi list,
On Tuesday I tried to sign up for the ANTLR list and asked this question.
For some reason I only got the confirmation email today however. After
replying I got accepted and my old email sent out.

Consequently I ask you to ignore it --- I somewhat figured an answer out and
finished the lexer and parser today (including the ambiguities). The ASTs
are next.

Thanks,

Michael





-----Original Message-----
From: Michael Sielemann [mailto:michael.sielemann at asdl.gatech.edu] 
Sent: Tuesday, September 27, 2005 7:26 PM
To: 'antlr-interest at antlr.org'
Subject: Conceptual problem with look ahead


Hi everybody,

I mostly finished writing the lexer for my first grammar right now and came
across something, which points me at a conceptual misunderstanding. I read
the related FAQ entries but could not figure this out. I would be happy to
receive any input you guys might have on this.

In my lexer, I have the following rule for multi-line comments in C/C++
fashion (it's basically the rule from the C grammar at
http://www.antlr.org/grammar/cgram/grammars/StdCParser.g).


COMMENTML : "/*"
            ( { LA(2) != '/' }? '*'
              | ( '\r' ('\n')?)
              | ~( '*'| '\r' | '\n' )
            )*
            "*/"                      {$setType(Token.SKIP);}
            ;


If I set k=2 for this lexer, ANTLR tells me that the choices are ambiguous:

ANTLR Parser Generator   Version 2.7.5 (20050201)   1989-2005 jGuru.com
lexical nondeterminism upon
k==1:'*'
k==2:'/'
between alt 1 and exit branch of block

When I set k to three, everything is fine.

My understanding is that the look-ahead length for lexers exactly
corresponds to characters. Alternative one checks whether a star would be
okay by looking at the following character - is it a dash or not. The exit
branch is "*/" if I am not mistaken. As these two constructs only look at
the next two characters, I expected everything to be fine. But obviously it
isn't. Even with the "anything can follow" concept I don't really get this.
The problem might be that I am not a CS guy but come from engineering, but
anyway....


Thank you very much for your help,

Michael


PS: One short second question concerning the lexers. If I want to refer to
the single quote ' as a single character in ANTLR, is the escape 
sequence '\'' okay? I think that it is not a classical C escape sequence but
it seems to be what corresponds best. Thanks.




More information about the antlr-interest mailing list