[antlr-interest] ast rewrites in left-recursive rules

Sam Harwell sharwell at pixelminegames.com
Wed Feb 23 17:07:21 PST 2011


(Begin objective section)

Inside the following rewrite:

e : e '*' b=e -> ^(...);

e refers to the first e (the one right after the ':' in the rule)
$e refers to the enclosing rule (the tree created so far)
$b refers to the second e (the one labeled 'b=')

This is never ambiguous because the following is not valid in a grammar:

e : e '*' e=e;

In particular, no label in the grammar can match the name of any rule in the
grammar, so $rulename appearing in a rewrite, where rulename is some rule in
the grammar, can only *ever* refer to an enclosing rule.

(End objective section, begin subjective section) :)

I'm sure you've noticed I've been pushing AST operators recently. :) With
the extended syntax, I've found that the vast majority (over 90%) of rules I
write use AST operators instead of rewrite syntax. Due to this, I rarely
have to worry about labels. For the rule you've written, I'd just use '*'^
and skip the rewrite. Another little known fact is the following rule:

e : NUM (LETTER^ NUM)*;

With the input "1 A 2 B 3" produces the following tree, which is a
left-associative infix operator:

(B (A 1 2) 3)

For the expression syntax you're using, AST operators remain a great
candidate as long as you can also handle right-associativity for them.

Sam


-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr
Sent: Wednesday, February 23, 2011 6:37 PM
To: antlr-interest Interest
Subject: [antlr-interest] ast rewrites in left-recursive rules

So I have it working with rewrite rules now:

e : e '.' ID 			-> ^('.' e ID)
  | e '.' 'this' 		-> ^('.' e 'this')
  | '-' e 			-> ^('-' e)
  | e '*' b=e 			-> ^('*' e $b)
  | e (op='+'|op='-') b=e	-> ^($op e $b)
  | INT 			-> INT
  | ID 				-> ID
  ;

But take a look at the multiplication rule: it needs a label on the second
e. plain e is ambiguous. I decided that plain e references the left
recursive version; since it will disappear during the transformation,
putting a label on that one won't work. we have to put a label on the second
reference as you see above. this is not optimal. can anyone think of a
better way to differentiate between the left and right e references in a
single alternative? [Note that e refers to the entire tree created so far.]

Ter

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list