[antlr-interest] Ignoring comments in predicates problem
Terence Parr
parrt at cs.usfca.edu
Mon Jan 31 14:05:11 PST 2005
On Jan 30, 2005, at 11:48 AM, Paul J. Lucas wrote:
> Given:
>
> protected Ignore
> : ( WhiteSpaceChar
> | "(:" ( options { greedy = false; } : . )* ":)"
> )+
> ;
>
> protected Keywords
> : // ...
> | (Identifier (Ignore)? '(' ~':')=> Identifier {
> $setType( FUNCTION_NAME );
> }
> ;
>
> That is "Ignore" is used in predicates to ignore either whitespace or
> comments
> -- a comment in XQuery is (: like this :)
>
> I do get a "nongreedy block may exit incorrectly due to limitations of
> linear
> approximate lookahead" warning for "Ignore".
Hi Paul, I believe in this case the warning is overly careful. As long
as the follow set is exactly a single sequence of chars, it will always
work.
> If I have an "Idenfitier" optionally followed by "Ignore" followed by
> '(' but
> not followed by a ':', then I have a function name. I want to handle
> all the
> cases of:
>
> foo( ...
> foo ( ...
> foo (: comment :) ( ...
>
> That is allow zero or more whitespaces or comments in between the
> Identifier
> and the '('. The second case above doesn't work.
I assume that the "foo (" is indeed not followed by a ':'.
> For the ANTLR-generated code for "Ignore" I get in part:
>
> switch ( LA(1)) {
> case '\t': case '\n': case '\r': case ' ':
> {
> mWhiteSpaceChar(false);
> break;
> }
> case '(':
> {
> match("(:");
>
> The execution enters the '(' case above, but then match() throws a
> RecognitionException because it doesn't match "(:". Back in the
> "Keywords"
> ANTLR-generated code, it's:
>
> try {
> mIdentifier(false);
> if ((_tokenSet_6.member(LA(1))) &&
> (_tokenSet_7.member(LA(2)))) {
> mIgnore(false);
> }
> else if ((LA(1)=='(') && (_tokenSet_8.member(LA(2)))) {
> }
> else {
> throw new NoViableAltForCharException((char)LA(1),
> getFilename(), getLine(), getColumn());
> }
> match('(');
> matchNot(':');
> }
> catch (RecognitionException pe) {
> synPredMatched255 = false;
> }
>
> What I *want* to happen is for execution to pick up at the "else if"
> above, but
> since mIgnore throws a RecognitionException, it jumps to the "catch"
> which is
> *not* what I want.
>
> It seems to me that the ANTLR-generated code for "Ignore" should *not*
> throw a
> RecognitionException for my second case. Why doesn't the generated
> code
> explicitly check for ':' after '(' and if the character is *not* ':'
> simply
> exit?
Hmm...this is odd. You have k>=2 I see. It should not enter ignore if
there is no "(:". Can you tell me what the _tokenSet_7 set looks like
from:
> if ((_tokenSet_6.member(LA(1))) &&
> (_tokenSet_7.member(LA(2)))) {
> mIgnore(false);
> }
> else if ((LA(1)=='(') && (_tokenSet_8.member(LA(2)))) {
> }
it should not enter this first IF and should go to the second else. If
you turn on the codeGenBitSetThreshold to a big number option (or
whatever it's called) it should list the chars it's testing for LA(2).
I think that is our key.
> How can I get what I want?
You could swat the fly with a hammer (read that "hack" it) by adding a
semantic predicate:
| (Identifier (({LA(1)=='('&&LA(2)!=':')||(is whitespace)}?
Ignore)? '(' ~':')=> ...
Shouldn't be necessary though...let's explore the lookahead set.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list