[antlr-interest] Bug or misunderstanding?: missing attribute access on rule scope

Fri Oct 16 15:58:19 PDT 2009

Kaleb Pederson wrote:
> On Fri, Oct 16, 2009 at 2:15 PM, David-Sarah Hopwood
> <david-sarah at jacaranda.org> wrote:
>> Kaleb Pederson wrote:
>>> I'm getting an error that doesn't make any sense, either because I've
>>> missed something fundamental or I've stumbled across a bug. I'm doing
>>> some type checking within a tree parser.  I have a plusMinusExpression
>>> which can either be a negation or a subtraction expression.  In order
>>> to to check to see which it is, I have `if ($rhs != null)` within my
>>> action.  That line, however, causes the following error:
>>>
>>> SemanticChecker.g:163:3: missing attribute access on rule scope: rhs
>>
>> You can use "if ($rhs.tree != null)".
> 
> You're right, thank you.  Here's what ANTLR generates:
> 
> ...
> if ((rhs!=null?((CommonTree)rhs.tree):null) != null)
> {
> 	typeChecker.assertIsNumericType((rhs!=null?rhs.type:null));
> 	typeChecker.assertEqualTypes((lhs!=null?lhs.type:null),
> (rhs!=null?rhs.type:null));
> }
> 
> So it places a guard around my check making sure that it only happens
> if rhs isn't null.  To pose my next question, isn't what I had
> perfectly legal? I.e. Isn't it pefectly legal to reference $labelName
> without referencing an attribute, such as in my null check?

The syntax $labelName can be used in a parser or lexer grammar to refer
to a Token object, in cases where the reference is statically guaranteed
to correspond to a single token or fragment.

In a tree grammar, or in other cases in a parser grammar, the reference
might correspond to more than one token. So there are two possible design
choices for that situation: either make $labelName evaluate to something
other than a Token object reference, or disallow it. Making it evaluate to
something other than a Token would be inconsistent and possibly error-
prone, so disallowing it is reasonable.

In the case where all you're doing is testing the reference against null,
it may seem as though it wouldn't matter whether it corresponds to a
single token or more than one token. But ANTLR doesn't analyse
expressions in the target language, so it can't special-case this
situation.

This restriction tripped me up as well when I first came across it.
Arguably, it would have been better to *always* require an attribute
access -- say, "$labelName.token" in cases where you want a Token, and
"$labelName.isPresent" to test whether labelName matched any tokens
(or characters in the case of a lexer grammar).
That would have been easier to remember, and more consistent between
different kinds of grammar. Currently lexer grammars are not even self-
consistent: depending on context a bare $labelName reference evaluates
sometimes to an integer code point value, and sometimes to a Token
object reference.

[The other thing I think is suboptimal about ANTLR's behaviour in this
area (at least the Java target; I haven't checked other targets) is
that it generates a variable in the target language that has the same
name as the label. This means that leaving off the '$' will result in
code that may compile, and if it does compile, usually does something
unintended. It also causes errors if a label name matches a target
language keyword. Just mangling the name slightly would have prevented
these problems, although doing so now might be incompatible with any
grammars that rely on this property of the generated code.]

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com