[antlr-interest] Ignoring comments in predicates problem

Paul J. Lucas pauljlucas at mac.com
Sun Jan 30 11:48:23 PST 2005


Given:

    protected Ignore
        :   (   WhiteSpaceChar
            |   "(:" ( options { greedy = false; } : . )* ":)"
            )+
        ;

    protected Keywords
        :   // ...
        |   (Identifier (Ignore)? '(' ~':')=> Identifier {
                $setType( FUNCTION_NAME );
            }
        ;

That is "Ignore" is used in predicates to ignore either whitespace or comments
-- a comment in XQuery is (: like this :)

I do get a "nongreedy block may exit incorrectly due to limitations of linear
approximate lookahead" warning for "Ignore".

If I have an "Idenfitier" optionally followed by "Ignore" followed by '(' but
not followed by a ':', then I have a function name.  I want to handle all the
cases of:

    foo( ...
    foo ( ...
    foo (: comment :) ( ...

That is allow zero or more whitespaces or comments in between the Identifier
and the '('.  The second case above doesn't work.

For the ANTLR-generated code for "Ignore" I get in part:

    switch ( LA(1)) {
    case '\t':  case '\n':  case '\r':  case ' ':
    {
        mWhiteSpaceChar(false);
        break;
    }
    case '(':
    {
        match("(:");

The execution enters the '(' case above, but then match() throws a
RecognitionException because it doesn't match "(:".  Back in the "Keywords"
ANTLR-generated code, it's:

    try {
        mIdentifier(false);
        if ((_tokenSet_6.member(LA(1))) && (_tokenSet_7.member(LA(2)))) {
            mIgnore(false);
        }
        else if ((LA(1)=='(') && (_tokenSet_8.member(LA(2)))) {
        }
        else {
            throw new NoViableAltForCharException((char)LA(1), getFilename(), getLine(), getColumn());
        }
        match('(');
        matchNot(':');
    }
    catch (RecognitionException pe) {
        synPredMatched255 = false;
    }

What I *want* to happen is for execution to pick up at the "else if" above, but
since mIgnore throws a RecognitionException, it jumps to the "catch" which is
*not* what I want.

It seems to me that the ANTLR-generated code for "Ignore" should *not* throw a
RecognitionException for my second case.  Why doesn't the generated code
explicitly check for ':' after '(' and if the character is *not* ':' simply
exit?

How can I get what I want?

- Paul



More information about the antlr-interest mailing list