[antlr-interest] Bug or misunderstanding?: missing attribute access on rule scope

Fri Oct 16 16:17:17 PDT 2009

On Fri, Oct 16, 2009 at 3:58 PM, David-Sarah Hopwood
<david-sarah at jacaranda.org> wrote:
> Kaleb Pederson wrote:
>> To pose my next question, isn't what I had
>> perfectly legal? I.e. Isn't it pefectly legal to reference $labelName
>> without referencing an attribute, such as in my null check?
>
> The syntax $labelName can be used in a parser or lexer grammar to refer
> to a Token object, in cases where the reference is statically guaranteed
> to correspond to a single token or fragment.
>
> In a tree grammar, or in other cases in a parser grammar, the reference
> might correspond to more than one token. So there are two possible design
> choices for that situation: either make $labelName evaluate to something
> other than a Token object reference, or disallow it. Making it evaluate to
> something other than a Token would be inconsistent and possibly error-
> prone, so disallowing it is reasonable.

Ahh, that's the difference.  In this case I'm working with a tree
grammar which doesn't behave the same.

> In the case where all you're doing is testing the reference against null,
> it may seem as though it wouldn't matter whether it corresponds to a
> single token or more than one token. But ANTLR doesn't analyse
> expressions in the target language, so it can't special-case this
> situation.

True, and this even holds for comments in the target language, which
bit me once.

> This restriction tripped me up as well when I first came across it.
> Arguably, it would have been better to *always* require an attribute
> access -- say, "$labelName.token" in cases where you want a Token, and
> "$labelName.isPresent" to test whether labelName matched any tokens
> (or characters in the case of a lexer grammar).
> That would have been easier to remember, and more consistent between
> different kinds of grammar. Currently lexer grammars are not even self-
> consistent: depending on context a bare $labelName reference evaluates
> sometimes to an integer code point value, and sometimes to a Token
> object reference.

I'll have to remember that.

> [The other thing I think is suboptimal about ANTLR's behaviour in this
> area (at least the Java target; I haven't checked other targets) is
> that it generates a variable in the target language that has the same
> name as the label. This means that leaving off the '$' will result in
> code that may compile, and if it does compile, usually does something
> unintended. It also causes errors if a label name matches a target
> language keyword. Just mangling the name slightly would have prevented
> these problems, although doing so now might be incompatible with any
> grammars that rely on this property of the generated code.]

Thanks for the great explanation and details.

--
Kaleb Pederson
Twitter - http://twitter.com/kalebpederson
Blog - http://kalebpederson.com