[antlr-interest] problem with comments

Sun Mar 19 07:32:35 PST 2006

Hi!

I've just discovered that ANTLR generates in some cases code that
can't process the grammar correctly.

...
class ZolwLexer extends Lexer;

options {
        k = 2;
}
...
MULT : '*' | '/' | '%';
CMP : '<' | '>' | "<=" | ">=" | "!=" ;
...
COMMENT : ( "//" (~ '\n')* '\n'
        | "#" (~ '\n')* '\n' )
        { $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
        ;

The problem occurs when generated code tries to distinguish between
the division ('/') and the beginning of the comment ('//').

ANTLR generates the following code:

...
if ((LA(1) == 0x23 /* '#' */  || LA(1) == 0x2f /* '/' */ ) && ((LA(2)
>= 0x0 /* '\0' */  && LA(2) <= 0x7f))) {
      mCOMMENT(true);
      theRetToken=_returnToken;
}
else if ((LA(1) == 0x2b /* '+' */  || LA(1) == 0x2d /* '-' */ ) && (true)) {
     mPLUS(true);
     theRetToken=_returnToken;
}
else if ((LA(1) == 0x25 /* '%' */  || LA(1) == 0x2a /* '*' */  ||
LA(1) == 0x2f /* '/' */ ) && (true)) {
     mMULT(true);
     theRetToken=_returnToken;
}
...

So you can clearly see that the first "if" statement is applied when
we process something like: "10 / 2", but it shouldn't. Anyone has
experienced similar problems?

I resolved it by splitting the COMMENT rule:
...
COMMENT1 : "//" (~ '\n')* '\n'
        { $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
        ;

COMMENT2 : "#" (~ '\n')* '\n'
        { $setType(ANTLR_USE_NAMESPACE(antlr)Token::SKIP); }
        ;
...

Now ANTLR generates correct code:
...
if ((LA(1) == 0x2f /* '/' */ ) && (LA(2) == 0x2f /* '/' */ )) {
   mCOMMENT1(true);
   theRetToken=_returnToken;
 }
 else if ((LA(1) == 0x2b /* '+' */  || LA(1) == 0x2d /* '-' */ ) && (true)) {
   mPLUS(true);
   theRetToken=_returnToken;
 }
 else if ((LA(1) == 0x25 /* '%' */  || LA(1) == 0x2a /* '*' */  ||
LA(1) == 0x2f /* '/' */ ) && (true)) {
   mMULT(true);
   theRetToken=_returnToken;
 }
....

Regards,
Maciek

--
Maciej Zawadziński