[antlr-interest] Text attribute for tree parser rule not working

Thu Jun 24 11:24:40 PDT 2010

On Thu, Jun 24, 2010 at 2:09 PM, Richard Thrippleton <
richard.thrippleton at progress.com> wrote:

> Jan F wrote:
>
>> Hello fellow ANTLRs, I have a problem with obtaining text and positions
>> for
>> one of my rules ion a tree walker, and since I ran out of ideas on what
>> might be wrong I am here to ask :-)
>>
>> My rule looks like this:
>>
>> memberExpression returns [ Expression expression = null ]
>> @after { post ($expression, $memberExpression.start,
>> $memberExpression.text); }
>>    : ^( BYINDEX parenLeftHandSideExpression expressionSt ) {
>>       $expression = new NIndexRefExpression (0, 0,
>> $parenLeftHandSideExpression.expression, $expressionSt.statement);
>>  }
>>    | ^( BYFIELD parenLeftHandSideExpression Identifier ) {
>>       $expression = new NFieldRefExpression (0, 0,
>> $parenLeftHandSideExpression.expression, $Identifier.text);
>>     }
>>    ;
>>
>> and the problem is that $memberExpression.text returns empty string,
>> caused
>> by the fact that $memberExpression.start has the start/stop indexes as -1.
>>
> By my understanding the .text attribute on a tree-walker rule will only
> return you the text from the top node of the tree. Is your parser
> constructing the BYINDEX / BYFIELD tokens with any useful text in them? By
> default, imaginary nodes get constructed with empty text.
>
> Richard
>

I believe that the $rule.text does actually give the entire text matched by
the rule. The generate parser has this code:

             post (retval.expression, ((CommonTree)retval.start),
input.getTokenStream().toString(
              input.getTreeAdaptor().getTokenStartIndex(retval.start),
              input.getTreeAdaptor().getTokenStopIndex(retval.start)));

which should give the text between the first and last token of the rule.

The Definitive ANTLR reference book says:

"The text derived from the first node matched by this rule. Each tree node
knows the range of input tokens from which it was created. Parsers automati-
cally set this range to the first and last token matched by the rule that
created the tree. This attribute includes the text for all tokens including
those on hidden channels, which is what you want because usually that has
all the whitespace and comments. When referring to the current rule, this
attribute is available in any action including exception actions. Note that
text is not well defined for rules like this:
slist : stat+ ;
because stat is not a single node or rooted with a single node. $slist.text
gets only the first stat tree."

However, this seems to have problems with imaginary tokens...