[antlr-interest] problem with comments
Maciej Zawadziński
mzawadzinski at gmail.com
Sun Mar 19 07:32:35 PST 2006
Hi!
I've just discovered that ANTLR generates in some cases code that
can't process the grammar correctly.
...
class ZolwLexer extends Lexer;
options {
k = 2;
}
...
MULT : '*' | '/' | '%';
CMP : '<' | '>' | "<=" | ">=" | "!=" ;
...
COMMENT : ( "//" (~ '\n')* '\n'
| "#" (~ '\n')* '\n' )
{ $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
;
The problem occurs when generated code tries to distinguish between
the division ('/') and the beginning of the comment ('//').
ANTLR generates the following code:
...
if ((LA(1) == 0x23 /* '#' */ || LA(1) == 0x2f /* '/' */ ) && ((LA(2)
>= 0x0 /* '\0' */ && LA(2) <= 0x7f))) {
mCOMMENT(true);
theRetToken=_returnToken;
}
else if ((LA(1) == 0x2b /* '+' */ || LA(1) == 0x2d /* '-' */ ) && (true)) {
mPLUS(true);
theRetToken=_returnToken;
}
else if ((LA(1) == 0x25 /* '%' */ || LA(1) == 0x2a /* '*' */ ||
LA(1) == 0x2f /* '/' */ ) && (true)) {
mMULT(true);
theRetToken=_returnToken;
}
...
So you can clearly see that the first "if" statement is applied when
we process something like: "10 / 2", but it shouldn't. Anyone has
experienced similar problems?
I resolved it by splitting the COMMENT rule:
...
COMMENT1 : "//" (~ '\n')* '\n'
{ $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
;
COMMENT2 : "#" (~ '\n')* '\n'
{ $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
;
...
Now ANTLR generates correct code:
...
if ((LA(1) == 0x2f /* '/' */ ) && (LA(2) == 0x2f /* '/' */ )) {
mCOMMENT1(true);
theRetToken=_returnToken;
}
else if ((LA(1) == 0x2b /* '+' */ || LA(1) == 0x2d /* '-' */ ) && (true)) {
mPLUS(true);
theRetToken=_returnToken;
}
else if ((LA(1) == 0x25 /* '%' */ || LA(1) == 0x2a /* '*' */ ||
LA(1) == 0x2f /* '/' */ ) && (true)) {
mMULT(true);
theRetToken=_returnToken;
}
....
Regards,
Maciek
--
Maciej Zawadziński
More information about the antlr-interest
mailing list