[antlr-interest] Text attribute for tree parser rule not working

Jan F netjan42 at gmail.com
Fri Jun 25 03:14:24 PDT 2010

I am a bit smarter, so posting an update.

The problem seems to be in the basic AST creation for rules that recurse,
such as mine:

    :  DOT Identifier        -> ^( BYFIELD $leftHandSideExpression
Identifier )

In  the generated parser, the corresponding method for the rule calls
"adaptor.setTokenBoundaries(retval.tree, retval.start, retval.stop);" at its
end, which updates the token start/stop indexes for the rule's result. Works
perfectly fine, except for the case when the recursion occurs (such as
matching "item.field.field2") - in this case the method is actually not
called recursively, but loops inside, and the setTokenBoundaries is only
called once for the outer CommonTree node - the internal nodes representing
the BYFIELD token keep their -1 values for startIndex/stopIndex.

So now for the $1000 question - how should I fix this? :-) Any ideas?

BTW. If anybody is debugging bugs in the AST creation, I can highly
recommend taking the time and implementing a debug tool that visualizes the
token list and AST tree. Suddenly so many issues (and that typically there
would be multiple ones at the same time) become clear.

I have created Eclipse view that is synced with the source editor and
displays list of tokens and the two AST trees I have (CommonTree-based and
DLTK ASTNode-based one) - when I click in the view, the corresponding code
in the source is selected. Great tool to fix problems with positions, among

I could make it available (sans the code that hooks it up to the parser, as
this is specific for everyone) if there is an interest.


On Thu, Jun 24, 2010 at 7:33 PM, Jan F <netjan42 at gmail.com> wrote:

> I spent most of today debugging this and putting together a view for
> Eclipse to display a tree of the two ASTs that I deal with (the ANTLR one
> and then a second one that I create using tree walker that is fed into
> Eclipse DLTK platform).
> So far it is clear, that the problem is that some nodes in the AST have a
> pseudo token which is not in the original token stream and has -1 as the
> token index (and no positioning info). Those pseudo tokens are created for
> some imaginary tokens, and only sometime.
> Per Andrew's suggestion I traced what is happening inside the addChild, and
> actually before it. The corresponding code in the generated parser is:
>     root_1 = (Object)adaptor.becomeRoot((Object)adaptor.create(BYFIELD,
> "BYFIELD"), root_1);
>     adaptor.addChild(root_1, stream_retval.nextTree());
>     adaptor.addChild(root_1, stream_Identifier.nextNode());
>     adaptor.addChild(root_0, root_1);
> and the adaptor.create (BYFIELD, ...) creates the new pseudo token, that is
> not in the token stream.
> From looking at the CommonTreeAdaptor.createToken method JavaDoc, it
> explains that for imaginary tokens something extra needs to be done
> I am still looking into what the best approach is here, as I have not quite
> figured out the whole picture yet .
> -Jan
> On Wed, Jun 23, 2010 at 7:45 PM, Andrew Bradnan <andrew.bradnan at gmail.com>wrote:
>> Yeah, CC the list.  I keep thinking it's automatic.
>> I just haven't crawled through the generated code enough to fully
>> understand when an AST node has a token and when it doesn't.  You should
>> probably just trace through the AddChild code.  It tries to keep the
>> children in a list when it can, but changes to real children of a nil node
>> at some magical point.
>> Re object.field.anotherfield   For my FIELD rule I just updated an Id
>> field on my custom AST node.  You could always update the start/end index
>> yourself, or add some custom ones if those are private.
>> I haven't seen a thing documentation wise, so I look forward to seeing
>> what you find out.
>> On Wed, Jun 23, 2010 at 9:49 AM, Jan F <netjan42 at gmail.com> wrote:
>>> Hmm, that shows that I have not really gotten a good understanding of how
>>> the rule/subrule attributes work.
>>> I have been fighting pretty hard with obtaining the position boundaries
>>> for AST elements, and what I ended up with, which works in most cases, is
>>> the trick with updating the positions in the @after section of each rule,
>>> based on the $rule.start position and $rule.text length.
>>> In my code below, I actually do want the boundaries of the
>>> memberExpression (which is like "object.field" reference) rule to be around
>>> the whole text (that is the parenLeftHandSideExpression (matches the
>>> "object" part) and Identifier (matches the "field" part) - so passing it
>>> from subrules as a return value does not really work - the BYFIELD is just
>>> an imaginary token.
>>> Actually a bit more context - the positions are correct if I parse text
>>> with "object.field", but stop working if I have a chain like
>>> "obejct.field.anotherfield" - so perhaps the problem could be somewhere
>>> else?
>>> BTW. I just noticed that you sent this only to me directly, would you
>>> mind if I cc the list on further replies?
>>> -Jan
>>> On Wed, Jun 23, 2010 at 6:21 PM, Andrew Bradnan <
>>> andrew.bradnan at gmail.com> wrote:
>>>> Only the AST's that actually matched one token will have the token
>>>> information filled out.  Subrules with multiple children are blank.  I
>>>> haven't actually tested those conditions extensively but just go with the
>>>> fixes below when the token information is missing.
>>>> To get around this I've either passed the values back from the subrules
>>>> in the grammar using returns or in the subrule I have updated a field on the
>>>> AST for the root (like on AST node for BYFIELD).  To update the AST node,
>>>> you need to have a custom AST class.  See setting options { ASTLabelType =
>>>> MyASTNode; }
>>>> Hopefully that will get you going again.
>>>> Andrew
>>>>   On Wed, Jun 23, 2010 at 7:53 AM, Jan F <netjan42 at gmail.com> wrote:
>>>>>  Hello fellow ANTLRs, I have a problem with obtaining text and
>>>>> positions for
>>>>> one of my rules ion a tree walker, and since I ran out of ideas on what
>>>>> might be wrong I am here to ask :-)
>>>>> My rule looks like this:
>>>>> memberExpression returns [ Expression expression = null ]
>>>>> @after { post ($expression, $memberExpression.start,
>>>>> $memberExpression.text); }
>>>>>    : ^( BYINDEX parenLeftHandSideExpression expressionSt ) {
>>>>>       $expression = new NIndexRefExpression (0, 0,
>>>>> $parenLeftHandSideExpression.expression, $expressionSt.statement);
>>>>>  }
>>>>>    | ^( BYFIELD parenLeftHandSideExpression Identifier ) {
>>>>>       $expression = new NFieldRefExpression (0, 0,
>>>>> $parenLeftHandSideExpression.expression, $Identifier.text);
>>>>>     }
>>>>>    ;
>>>>> and the problem is that $memberExpression.text returns empty string,
>>>>> caused
>>>>> by the fact that $memberExpression.start has the start/stop indexes as
>>>>> -1.
>>>>> I have a second rule for something else, which looks very similar, and
>>>>> that
>>>>> one (as well as all others) work perfectly fine, the $rule.text
>>>>> containing
>>>>> the text corresponding to what the rule matched.
>>>>> Any ideas why this may be happening?
>>>>> -Jan
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe:
>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>> --
>>>> /Andrew
>> --
>> /Andrew

More information about the antlr-interest mailing list