[antlr-interest] ANTLR v3 tree construction :)
Terence Parr
parrt at cs.usfca.edu
Wed Jun 22 18:56:17 PDT 2005
Howdy,
Well, I've had a productive week or two. Got ANTLR building ASTs
automatically and also via the rewrite rules, which I've described
below. This email is a dup of the blog entry I just added.
http://www.antlr.org/blog/antlr3/trees.tml
Ter
PS If I've been ignoring your email, it's been because I have been
singularly focused on this stuff!
------------------------
#### REWRITE RULES
The new rewrite stuff is a grammar to grammar mapping where you
specify how to match something and then how to generate the resulting
tree. Note that you can generate a token stream by simply generating
a flat list like:
a : A B -> B A ; // reverse the input
The goal is to avoid arbitrary actions and the key insight to get
pure rewrites was that the right side of the -> should be a tree
grammar complete with (...)+ loops etc... Monty, Loring, and I
discussed the rewrite rules extensively at the Oregon cabal 2 years
ago but we couldn't figure out what to do with repeated elements like:
list : ID (',' ID)* ;
How can you generate just the list of IDs, for example. Easy after
you look at the right side like a grammar. Just list the tree
grammar fragment that would match what you would like to generate and
ANTLR will figure out how to do the transformation. The elements on
the left are queued up and then used as input streams to the
"unparser" grammar on the right. Here is the rewrite to get the list
of IDs:
list : ID (',' ID)* -> ID+ ;
Nice, eh? What if I want an imaginary token node, VAR, above the list?
list : ID (',' ID)* -> ^(VAR ID+) ;
What if I want a VAR for *each* ID? Move the VAR inside the plus loop:
list : ID (',' ID)* -> ^(VAR ID)+ ;
I think that is pretty slick. Note that to get your tree parser
grammar, ANTLR should be able to copy these rules for you into a tree
grammar similar to what Loring does, although his 2.8e tree grammar
creator is more sophisticated than what I have in mind.
I have not done the tree parsers yet, but I think it will be really
easy. The details may kill me though. ;) I should push an
ANTLR-3.0ea3 release shortly.
#### SOME NOTES
My brief comments in the README.txt file:
* Automatic tree construction operators are in: ! ^ ^^
* Tree construction rewrite rules are in
-> {pred1}? rewrite1
-> {pred2}? rewrite2
...
-> rewriteN
The rewrite rules may be elements like ID, expr, $label, {node expr}
and trees ^( <root> <children> ). You have have (...)?, (...)*,
(...)+
subrules as well.
You may have rewrites in subrules not just at outer level of rule,
but
any -> rewrite forces auto AST construction off for that alternative
of that rule.
To avoid cycles, copy semantics are used:
r : INT -> INT INT ;
means make two new nodes from the same INT token.
Repeated references to a rule element implies a copy for at least one
tree:
a : atom -> ^(atom atom) ; // NOT CYCLE! (dup atom tree)
* $ruleLabel.tree refers to tree created by matching the labeled
element.
#### EXAMPLES
Here are some example rewrites. I have about 40 functional tests for
rewrites and about the same for automatic tree construction.
a : ID INT -> ; // don't generate anything
a : b b -> b+; // generate a list of two b's
a : ID INT -> ^(ID INT) ;
a : ID INT -> {false}? ^(ID INT)
-> {true}? ^(INT ID)
-> ID
;
a : op INT -> ^(op INT);
op : '+'|'-' ;
a : "var" (ID ':' type ';')+ -> ^("var" ^(':' ID type)+) ;
a : ID (',' ID)*-> ^(VAR ID)+ ; // create imaginary VAR node above
each ID
a : lc='{' ID+ '}' -> ^(BLOCK[$lc] ID+) ; // derive BLOCK imaginary
token node from $lc
a : first=ID (',' others+=ID)* -> $first $others+ ; // use of += labels
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list