[antlr-interest] ANTLR v3 tree construction :)

Terence Parr parrt at cs.usfca.edu
Wed Jun 22 18:56:17 PDT 2005


Howdy,

Well, I've had a productive week or two.  Got ANTLR building ASTs  
automatically and also via the rewrite rules, which I've described  
below.  This email is a dup of the blog entry I just added.

http://www.antlr.org/blog/antlr3/trees.tml

Ter
PS    If I've been ignoring your email, it's been because I have been  
singularly focused on this stuff!

------------------------

#### REWRITE RULES

The new rewrite stuff is a grammar to grammar mapping where you  
specify how to match something and then how to generate the resulting  
tree.  Note that you can generate a token stream by simply generating  
a flat list like:

a : A B -> B A ; // reverse the input

The goal is to avoid arbitrary actions and the key insight to get  
pure rewrites was that the right side of the -> should be a tree  
grammar complete with (...)+ loops etc...  Monty, Loring, and I  
discussed the rewrite rules extensively at the Oregon cabal 2 years  
ago but we couldn't figure out what to do with repeated elements like:

list : ID (',' ID)* ;

How can you generate just the list of IDs, for example.  Easy after  
you look at the right side like a grammar.  Just list the tree  
grammar fragment that would match what you would like to generate and  
ANTLR will figure out how to do the transformation.  The elements on  
the left are queued up and then used as input streams to the  
"unparser" grammar on the right.  Here is the rewrite to get the list  
of IDs:

list : ID (',' ID)* -> ID+ ;

Nice, eh?  What if I want an imaginary token node, VAR, above the list?

list : ID (',' ID)* -> ^(VAR ID+) ;

What if I want a VAR for *each* ID?  Move the VAR inside the plus loop:

list : ID (',' ID)* -> ^(VAR ID)+ ;

I think that is pretty slick.  Note that to get your tree parser  
grammar, ANTLR should be able to copy these rules for you into a tree  
grammar similar to what Loring does, although his 2.8e tree grammar  
creator is more sophisticated than what I have in mind.

I have not done the tree parsers yet, but I think it will be really  
easy.  The details may kill me though. ;)  I should push an  
ANTLR-3.0ea3 release shortly.

#### SOME NOTES

My brief comments in the README.txt file:

* Automatic tree construction operators are in: ! ^ ^^

* Tree construction rewrite rules are in
     -> {pred1}? rewrite1
     -> {pred2}? rewrite2
     ...
     -> rewriteN

   The rewrite rules may be elements like ID, expr, $label, {node expr}
   and trees ^( <root> <children> ).  You have have (...)?, (...)*,  
(...)+
   subrules as well.

   You may have rewrites in subrules not just at outer level of rule,  
but
   any -> rewrite forces auto AST construction off for that alternative
   of that rule.

   To avoid cycles, copy semantics are used:

   r : INT -> INT INT ;

   means make two new nodes from the same INT token.

   Repeated references to a rule element implies a copy for at least one
   tree:

   a : atom -> ^(atom atom) ; // NOT CYCLE! (dup atom tree)

* $ruleLabel.tree refers to tree created by matching the labeled  
element.

#### EXAMPLES

Here are some example rewrites.  I have about 40 functional tests for  
rewrites and about the same for automatic tree construction.

a : ID INT -> ; // don't generate anything

a : b b -> b+;  // generate a list of two b's

a : ID INT -> ^(ID INT) ;

a : ID INT -> {false}? ^(ID INT)
            -> {true}? ^(INT ID)
            -> ID
   ;

a : op INT -> ^(op INT);
op : '+'|'-' ;

a : "var" (ID ':' type ';')+ -> ^("var" ^(':' ID type)+) ;

a : ID (',' ID)*-> ^(VAR ID)+ ; // create imaginary VAR node above  
each ID

a : lc='{' ID+ '}' -> ^(BLOCK[$lc] ID+) ; // derive BLOCK imaginary  
token node from $lc

a : first=ID (',' others+=ID)* -> $first $others+ ; // use of += labels
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com



More information about the antlr-interest mailing list