[antlr-interest] Text attribute for tree parser rule not working

Thu Jun 24 10:33:20 PDT 2010

I spent most of today debugging this and putting together a view for Eclipse
to display a tree of the two ASTs that I deal with (the ANTLR one and then a
second one that I create using tree walker that is fed into Eclipse DLTK
platform).

So far it is clear, that the problem is that some nodes in the AST have a
pseudo token which is not in the original token stream and has -1 as the
token index (and no positioning info). Those pseudo tokens are created for
some imaginary tokens, and only sometime.

Per Andrew's suggestion I traced what is happening inside the addChild, and
actually before it. The corresponding code in the generated parser is:

    root_1 = (Object)adaptor.becomeRoot((Object)adaptor.create(BYFIELD,
"BYFIELD"), root_1);
    adaptor.addChild(root_1, stream_retval.nextTree());
    adaptor.addChild(root_1, stream_Identifier.nextNode());
    adaptor.addChild(root_0, root_1);

and the adaptor.create (BYFIELD, ...) creates the new pseudo token, that is
not in the token stream.

>From looking at the CommonTreeAdaptor.createToken method JavaDoc, it
explains that for imaginary tokens something extra needs to be done

I am still looking into what the best approach is here, as I have not quite
figured out the whole picture yet .

-Jan

On Wed, Jun 23, 2010 at 7:45 PM, Andrew Bradnan <andrew.bradnan at gmail.com>wrote:

> Yeah, CC the list.  I keep thinking it's automatic.
>
> I just haven't crawled through the generated code enough to fully
> understand when an AST node has a token and when it doesn't.  You should
> probably just trace through the AddChild code.  It tries to keep the
> children in a list when it can, but changes to real children of a nil node
> at some magical point.
>
> Re object.field.anotherfield   For my FIELD rule I just updated an Id field
> on my custom AST node.  You could always update the start/end index
> yourself, or add some custom ones if those are private.
> I haven't seen a thing documentation wise, so I look forward to seeing what
> you find out.
> On Wed, Jun 23, 2010 at 9:49 AM, Jan F <netjan42 at gmail.com> wrote:
>
>> Hmm, that shows that I have not really gotten a good understanding of how
>> the rule/subrule attributes work.
>>
>> I have been fighting pretty hard with obtaining the position boundaries
>> for AST elements, and what I ended up with, which works in most cases, is
>> the trick with updating the positions in the @after section of each rule,
>> based on the $rule.start position and $rule.text length.
>>
>> In my code below, I actually do want the boundaries of the
>> memberExpression (which is like "object.field" reference) rule to be around
>> the whole text (that is the parenLeftHandSideExpression (matches the
>> "object" part) and Identifier (matches the "field" part) - so passing it
>> from subrules as a return value does not really work - the BYFIELD is just
>> an imaginary token.
>>
>> Actually a bit more context - the positions are correct if I parse text
>> with "object.field", but stop working if I have a chain like
>> "obejct.field.anotherfield" - so perhaps the problem could be somewhere
>> else?
>>
>> BTW. I just noticed that you sent this only to me directly, would you mind
>> if I cc the list on further replies?
>>
>> -Jan
>>
>>
>> On Wed, Jun 23, 2010 at 6:21 PM, Andrew Bradnan <andrew.bradnan at gmail.com
>> > wrote:
>>
>>> Only the AST's that actually matched one token will have the token
>>> information filled out.  Subrules with multiple children are blank.  I
>>> haven't actually tested those conditions extensively but just go with the
>>> fixes below when the token information is missing.
>>>
>>> To get around this I've either passed the values back from the subrules
>>> in the grammar using returns or in the subrule I have updated a field on the
>>> AST for the root (like on AST node for BYFIELD).  To update the AST node,
>>> you need to have a custom AST class.  See setting options { ASTLabelType =
>>> MyASTNode; }
>>>
>>> Hopefully that will get you going again.
>>> Andrew
>>>
>>>   On Wed, Jun 23, 2010 at 7:53 AM, Jan F <netjan42 at gmail.com> wrote:
>>>
>>>>  Hello fellow ANTLRs, I have a problem with obtaining text and
>>>> positions for
>>>> one of my rules ion a tree walker, and since I ran out of ideas on what
>>>> might be wrong I am here to ask :-)
>>>>
>>>> My rule looks like this:
>>>>
>>>> memberExpression returns [ Expression expression = null ]
>>>> @after { post ($expression, $memberExpression.start,
>>>> $memberExpression.text); }
>>>>    : ^( BYINDEX parenLeftHandSideExpression expressionSt ) {
>>>>       $expression = new NIndexRefExpression (0, 0,
>>>> $parenLeftHandSideExpression.expression, $expressionSt.statement);
>>>>  }
>>>>    | ^( BYFIELD parenLeftHandSideExpression Identifier ) {
>>>>       $expression = new NFieldRefExpression (0, 0,
>>>> $parenLeftHandSideExpression.expression, $Identifier.text);
>>>>     }
>>>>    ;
>>>>
>>>> and the problem is that $memberExpression.text returns empty string,
>>>> caused
>>>> by the fact that $memberExpression.start has the start/stop indexes as
>>>> -1.
>>>>
>>>> I have a second rule for something else, which looks very similar, and
>>>> that
>>>> one (as well as all others) work perfectly fine, the $rule.text
>>>> containing
>>>> the text corresponding to what the rule matched.
>>>>
>>>> Any ideas why this may be happening?
>>>>
>>>> -Jan
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe:
>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>
>>>
>>>
>>>
>>> --
>>> /Andrew
>>>
>>
>>
>
>
> --
> /Andrew
>