[antlr-interest] (Basic Questions) Non-greedy loops & non-determinism
    Royne Borrud 
    royne.borrud at gmail.com
       
    Sat Nov 26 07:16:30 PST 2005
    
    
  
Started out with antlr a couple of days ago, so I have a feeling these
are rather basic questions... :) In my grammar that's supposed to
allow C/++ style comments I've got the following in the lexer:
*********************
LINECOMMENT:
            ( "((" ( options { greedy = false; }: . )* NEWLINE ) {
$setType(antlr::Token::SKIP); };
MULTICOMMENT options { ignore = NEWLINE; }:
            ( "/*" ( options { greedy = false; }: . )* "*/" ) {
$setType(antlr::Token::SKIP); };
NEWLINE:
            ( '\r' ( '\n' )? | '\n' ) { newline(); };
This gives the following warnings;
On linecomment row:
warning:nongreedy block may exit incorrectly due to limitations of
linear approximate lookahead (first k-1 sets in lookahead not
singleton).
On multicomment row:
warning:lexical nondeterminism between alts 1 and 2 of block upon k==1:'\n','\r'
k==2:'\u0000'..'\u007f'
*********************
So, my questions:
Why the first warning? If the lexer sees a '\n' or '\r' when in the
linecomment loop it should break the loop. I don't see how this can
ever 'exit incorrectly'. The only nondeterminism in the NEWLINE token
is for the optional second token, if the first is '\r', so whether to
enter the newline rule should be clear at any point in the loop?
What does the second warning mean in plain english? Seems like the
lexer won't know what to do when encountering a '\n' or '\r', but
won't ignore = NEWLINE ensure that those all get eaten? And btw, the
reason for using ignore = NEWLINE is to make sure line numbers are
tracked correctly even inside multi line comments. Does that work as
intended?
So, are these just overly cautios warnings that I could safely ignore
( or shut off ), or am I doing something dangerous?
Also, I've looked at the generated lexer code, and in mMULTICOMMENT it
doesn't call NEWLINE between every match as I thought it would. It
only calls NEWLINE between matching "/*" and entering the loop. Maybe
I'm misunderstanding the way the ignore option works?
(http://www.antlr.org/doc/lexer.html#ignoringwhitespace)
Any and all help and/or answers appreciated.
    
    
More information about the antlr-interest
mailing list