[antlr-interest] Wrong code generated by the lexer

Wed Jun 25 21:52:39 PDT 2003

I parse a language that has "to-end-of-line" comments like this:

-- blah

and multiline comments like this:

(-- blah --).

So I wrote a lexer rule to collect all of these:

COMMENT
    :
    (
        "--"! (~('\n' | '\r'))* 
        "(--" (options { greedy = false; } : ANY)* "--)"
    )+
    {
        s_lastComment = $getText;
        $setType(antlr::Token::SKIP);
    }
    ;

This rule catches a block with several single-line and multi-line 
comments. (ANY is a '.' that takes care of newline(). I also have 
separate lexer tokens for '-' and '(').

So far, so good. However, when lexing this:

foo(-bar);

the lexer issued an error insisting that the "(-" sequence ought to 
start a comment. I looked through the generated code and the lexer 
decides that if (LA(1) == '(' && LA(2) == '-'), then a comment must 
follow.

What's going on? I increased the lookahead to no avail. Why does the 
lexer look only at LA(1) and LA(2)?

Thanks,

Andrei

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/