[antlr-interest] Wrong code generated by the lexer
uprightness_of_character
andrei at metalanguage.com
Wed Jun 25 21:52:39 PDT 2003
I parse a language that has "to-end-of-line" comments like this:
-- blah
and multiline comments like this:
(-- blah --).
So I wrote a lexer rule to collect all of these:
COMMENT
:
(
"--"! (~('\n' | '\r'))*
"(--" (options { greedy = false; } : ANY)* "--)"
)+
{
s_lastComment = $getText;
$setType(antlr::Token::SKIP);
}
;
This rule catches a block with several single-line and multi-line
comments. (ANY is a '.' that takes care of newline(). I also have
separate lexer tokens for '-' and '(').
So far, so good. However, when lexing this:
foo(-bar);
the lexer issued an error insisting that the "(-" sequence ought to
start a comment. I looked through the generated code and the lexer
decides that if (LA(1) == '(' && LA(2) == '-'), then a comment must
follow.
What's going on? I increased the lookahead to no avail. Why does the
lexer look only at LA(1) and LA(2)?
Thanks,
Andrei
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list