[antlr-interest] need help with predicates

Thu Aug 9 09:45:43 PDT 2007

On 8/10/07, Andy Tripp <antlr at jazillian.com> wrote:
> The language I'm parsing, visual basic, lets an identifier have a '!'
> suffix:
>
> Identifier:
>     '['? LETTER (LETTER| DECIMAL_LITERAL)* ('%'|'#'|'$'|'&'|'!')? ']'?
>     ;
>
> But it also lets you use '!' as a "separator" the way C/C++/Java/etc.
> use '.'
> In the midst of a hierarchy of rules dealing with expressions, I have
> this rule:
>
> dotOpExpression:
>     unaryOps (
>           DOT^ dotOperand?
>         | BANG^ anyName?
>         )*
>     ;
>
> Here, the unaryOps, dotOperand, and anyName rules all eventually refer
> to Identifier.
> So the problem is that during the dotOpExpression processing, the
> unaryOps consumes
> the Identifier, including the '!'. So in trying to match "a!b", it
> fails, because it took "a!"
> as the Identifier and couldn't match the rest.
>
> So one solution is to take the '!' out of the Identifier rule, perhaps
> now calling it IdentifierNoBang,
> and then have alternative versions of other rules (unaryOpsNoBang,
> dotOperandNoBang, anyNameNoBang, etc).
> But that would be a huge mess.
>
> It seems like a syntactic predicate with "backtrack=true" should work
> here, but I can't quite see how.
> I want to say, in dotOpExpression, "try to match this pattern, but if
> that doesn't work, try again, but this
> time don't allow a '!' at the end of unaryOps". I can't see how to do
> that without all that rework to
> remove the '!' from Identifier.
Syntactic predicates only help ANTLR decide between alternatives, so
you still need to be able to specify the alternates as standard rules.
So you need some way to a specify an identifier with or without bang.
Apart from duplicated rules the option is a gated semantic predicate
with either a field or a rule parameters or a scope.
I think with a field you might run into nesting issues, though not sure there.
With parameters:
dotOpExpression
    :    (identifier[false] BANG identifier[true])=>identifier BANG
dotOpExpression
    |    unaryOps (DOT^ dotOperand?)
    ;

identifier[boolean allowBang]
    :    'a'..'z'+
        (    {allowBang}?=>BANG
        |    // Epsilon
        )
    ;

Though then you have to always pass allowBang to your identifier rule,
and will need to pass it down through various rules to get to
identifier.

You might be able to use scopes, but I think then you'd need to put
them in a rule that all calls to identifier went through or else they
wouldn't exist. So, I don't think dynamic scopes are suitable (I
assume there is access to identifier not through dotOpExpression).
Maybe you could add:
scope IdentiferBang {
boolean allow;
}
Then do:
start
scope identifierBang;
@init {
    identifierBang::allow = true;
}
    :    ...
    ;

dotOpExprssion
scope identifierBang;
    :   { identifierBang::allow = false; }
        (unaryOps BANG anyName)=>unaryOps BANG^ dotOpExpression
    |   { identifierBang::allow = true; }
        unaryOps (DOT^ dotOperand?)
    ;

identifier
    :    'a'..'z'+
        (    { identifierBang::allow }?=>BANG
        |    // Epsilon
        )
So if there's a call to dotOpExpression on the way to identifier it
will get that copy of the scope, otherwise it will get the default
copy of the scope from the start rule.
Not especially clean, but it might work in lieu of a nicer solution.

Tom.
>
> Any ideas?
> Thanks,
> Andy
>
>
>
>
>