[antlr-interest] Text attribute for tree parser rule not working

Fri Jun 25 03:14:24 PDT 2010

I am a bit smarter, so posting an update.

The problem seems to be in the basic AST creation for rules that recurse,
such as mine:

leftHandSideExpression
    :  DOT Identifier        -> ^( BYFIELD $leftHandSideExpression
Identifier )
    ;

In  the generated parser, the corresponding method for the rule calls
"adaptor.setTokenBoundaries(retval.tree, retval.start, retval.stop);" at its
end, which updates the token start/stop indexes for the rule's result. Works
perfectly fine, except for the case when the recursion occurs (such as
matching "item.field.field2") - in this case the method is actually not
called recursively, but loops inside, and the setTokenBoundaries is only
called once for the outer CommonTree node - the internal nodes representing
the BYFIELD token keep their -1 values for startIndex/stopIndex.

So now for the $1000 question - how should I fix this? :-) Any ideas?

BTW. If anybody is debugging bugs in the AST creation, I can highly
recommend taking the time and implementing a debug tool that visualizes the
token list and AST tree. Suddenly so many issues (and that typically there
would be multiple ones at the same time) become clear.

I have created Eclipse view that is synced with the source editor and
displays list of tokens and the two AST trees I have (CommonTree-based and
DLTK ASTNode-based one) - when I click in the view, the corresponding code
in the source is selected. Great tool to fix problems with positions, among
others.

I could make it available (sans the code that hooks it up to the parser, as
this is specific for everyone) if there is an interest.

-Jan

On Thu, Jun 24, 2010 at 7:33 PM, Jan F <netjan42 at gmail.com> wrote:

> I spent most of today debugging this and putting together a view for
> Eclipse to display a tree of the two ASTs that I deal with (the ANTLR one
> and then a second one that I create using tree walker that is fed into
> Eclipse DLTK platform).
>
> So far it is clear, that the problem is that some nodes in the AST have a
> pseudo token which is not in the original token stream and has -1 as the
> token index (and no positioning info). Those pseudo tokens are created for
> some imaginary tokens, and only sometime.
>
> Per Andrew's suggestion I traced what is happening inside the addChild, and
> actually before it. The corresponding code in the generated parser is:
>
>     root_1 = (Object)adaptor.becomeRoot((Object)adaptor.create(BYFIELD,
> "BYFIELD"), root_1);
>     adaptor.addChild(root_1, stream_retval.nextTree());
>     adaptor.addChild(root_1, stream_Identifier.nextNode());
>     adaptor.addChild(root_0, root_1);
>
> and the adaptor.create (BYFIELD, ...) creates the new pseudo token, that is
> not in the token stream.
>
> From looking at the CommonTreeAdaptor.createToken method JavaDoc, it
> explains that for imaginary tokens something extra needs to be done
>
> I am still looking into what the best approach is here, as I have not quite
> figured out the whole picture yet .
>
> -Jan
>
>
> On Wed, Jun 23, 2010 at 7:45 PM, Andrew Bradnan <andrew.bradnan at gmail.com>wrote:
>
>> Yeah, CC the list.  I keep thinking it's automatic.
>>
>> I just haven't crawled through the generated code enough to fully
>> understand when an AST node has a token and when it doesn't.  You should
>> probably just trace through the AddChild code.  It tries to keep the
>> children in a list when it can, but changes to real children of a nil node
>> at some magical point.
>>
>> Re object.field.anotherfield   For my FIELD rule I just updated an Id
>> field on my custom AST node.  You could always update the start/end index
>> yourself, or add some custom ones if those are private.
>> I haven't seen a thing documentation wise, so I look forward to seeing
>> what you find out.
>> On Wed, Jun 23, 2010 at 9:49 AM, Jan F <netjan42 at gmail.com> wrote:
>>
>>> Hmm, that shows that I have not really gotten a good understanding of how
>>> the rule/subrule attributes work.
>>>
>>> I have been fighting pretty hard with obtaining the position boundaries
>>> for AST elements, and what I ended up with, which works in most cases, is
>>> the trick with updating the positions in the @after section of each rule,
>>> based on the $rule.start position and $rule.text length.
>>>
>>> In my code below, I actually do want the boundaries of the
>>> memberExpression (which is like "object.field" reference) rule to be around
>>> the whole text (that is the parenLeftHandSideExpression (matches the
>>> "object" part) and Identifier (matches the "field" part) - so passing it
>>> from subrules as a return value does not really work - the BYFIELD is just
>>> an imaginary token.
>>>
>>> Actually a bit more context - the positions are correct if I parse text
>>> with "object.field", but stop working if I have a chain like
>>> "obejct.field.anotherfield" - so perhaps the problem could be somewhere
>>> else?
>>>
>>> BTW. I just noticed that you sent this only to me directly, would you
>>> mind if I cc the list on further replies?
>>>
>>> -Jan
>>>
>>>
>>> On Wed, Jun 23, 2010 at 6:21 PM, Andrew Bradnan <
>>> andrew.bradnan at gmail.com> wrote:
>>>
>>>> Only the AST's that actually matched one token will have the token
>>>> information filled out.  Subrules with multiple children are blank.  I
>>>> haven't actually tested those conditions extensively but just go with the
>>>> fixes below when the token information is missing.
>>>>
>>>> To get around this I've either passed the values back from the subrules
>>>> in the grammar using returns or in the subrule I have updated a field on the
>>>> AST for the root (like on AST node for BYFIELD).  To update the AST node,
>>>> you need to have a custom AST class.  See setting options { ASTLabelType =
>>>> MyASTNode; }
>>>>
>>>> Hopefully that will get you going again.
>>>> Andrew
>>>>
>>>>   On Wed, Jun 23, 2010 at 7:53 AM, Jan F <netjan42 at gmail.com> wrote:
>>>>
>>>>>  Hello fellow ANTLRs, I have a problem with obtaining text and
>>>>> positions for
>>>>> one of my rules ion a tree walker, and since I ran out of ideas on what
>>>>> might be wrong I am here to ask :-)
>>>>>
>>>>> My rule looks like this:
>>>>>
>>>>> memberExpression returns [ Expression expression = null ]
>>>>> @after { post ($expression, $memberExpression.start,
>>>>> $memberExpression.text); }
>>>>>    : ^( BYINDEX parenLeftHandSideExpression expressionSt ) {
>>>>>       $expression = new NIndexRefExpression (0, 0,
>>>>> $parenLeftHandSideExpression.expression, $expressionSt.statement);
>>>>>  }
>>>>>    | ^( BYFIELD parenLeftHandSideExpression Identifier ) {
>>>>>       $expression = new NFieldRefExpression (0, 0,
>>>>> $parenLeftHandSideExpression.expression, $Identifier.text);
>>>>>     }
>>>>>    ;
>>>>>
>>>>> and the problem is that $memberExpression.text returns empty string,
>>>>> caused
>>>>> by the fact that $memberExpression.start has the start/stop indexes as
>>>>> -1.
>>>>>
>>>>> I have a second rule for something else, which looks very similar, and
>>>>> that
>>>>> one (as well as all others) work perfectly fine, the $rule.text
>>>>> containing
>>>>> the text corresponding to what the rule matched.
>>>>>
>>>>> Any ideas why this may be happening?
>>>>>
>>>>> -Jan
>>>>>
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe:
>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> /Andrew
>>>>
>>>
>>>
>>
>>
>> --
>> /Andrew
>>
>
>