[antlr-interest] ast rewrites in left-recursive rules

Wed Feb 23 18:05:47 PST 2011

On Wed, 2011-02-23 at 16:37 -0800, Terence Parr wrote:
> So I have it working with rewrite rules now:
> 
> e : e '.' ID 			-> ^('.' e ID)
>   | e '.' 'this' 		-> ^('.' e 'this')
>   | '-' e 			-> ^('-' e)
>   | e '*' b=e 			-> ^('*' e $b)
>   | e (op='+'|op='-') b=e	-> ^($op e $b)
>   | INT 			-> INT
>   | ID 				-> ID
>   ;
> 
> But take a look at the multiplication rule: it needs a label on the second e. plain e is ambiguous. I decided that plain e references the left recursive version; since it will disappear during the transformation, putting a label on that one won't work. we have to put a label on the second reference as you see above. this is not optimal. can anyone think of a better way to differentiate between the left and right e references in a single alternative? [Note that e refers to the entire tree created so far.]
> 

First, apologies. I have not been following your effort on this so far,
so my comments are probably just useless noise, sorry about that.

1) If the user must label the second recursive reference, will you be
able to provide a *really* good error message when he forgets?

question: do i have to tell ANTLR which rules are in play for this
and/or which operators are supposed to be roots?

question: is the token in the rule always the root? if so, what if the
rule has more than 1 token? 
(like conditionals e.g. e: e '?' e ':' e -> ^('?' e_1 e_2 e_3) which
works for first token as root but maybe there are other possibilities).

2) is it possible - as part of your transformation of the left recursion
into an equivalent right recursion - just spawn the label for the second
recursion (and third, fourth, ... if needed) as part of that
transformation process. so for example

e : ....
  | e '*' e
  | ...
  ;

is first pattern matched and preliminarily transformed into

e : ...
  | e first_unique_spawned_label='*' second_unique_spawned_label=e
  | ...
  ;

which you then subsequently transform using your current technique

e : ...
  | e fst_unique='*' snd_unique=e -> ^($fst_unique e $snd_unique)
  | ...
  ;

and 

...| e ('+'|'-) e

becomes

...| e (third_unique='+'|third_unique='-') fourth_unique=e
               -> ^($third_unique e $fourth_unique)

a bad part of this suggestion is that you will probably have to reserve
some label prefix (or maybe suffix) for use in unique label generation.
which users will need to be aware of...

again, sorry if the above is off-base.
   -jbb