[antlr-interest] ANTLR 3.0 tree construction proposal

Sun Jan 30 21:50:02 PST 2005

Well, I do have to respond.  For those interested in working with an
implementation of more complete functionality, I am hoping to release 2.8
this week.  Micheal Jor got the C# code generation working Saturday; I'm
folding in the 2.7.5 stuff now.  Open Channel has set up a website for the
release, but there are still some minor licensing issues to either fix or
live with.  This will be labeled a "beta" release; as soon as I get that
out, I can work with Marq and Wolfgang to get the python code generation
upgraded for tree rewriting.

2.8 tree construction has the following features:
1.)  Tree "actions":  ^{ <tree fragment spec> }
	Within a tree action, AST structures are specified in a notation
that mirrors tree walker syntax.  Tree actions support inline tree
construction; the contents of a tree action are added to the current tree
under construction.  [As a side note, the tree action syntax is most of the
complexity to which Ter refers to in his article--he prefers specifying the
complete output.  More on that below.]
2.)  Tree constructors:  ^( <root> <body> )
	Tree constructors can nest, and are recognized by #( <root> <body> )
constructs.
3.)  Node constructors:  ^[<node description>]
	Node constructors may appear either inside or outside a tree action
and are the mechanism for adding "imaginary" nodes.  Outside a tree action,
node constructors may be suffixed with ^ (^[...]^) to make it the root of
the current rule.
4.)  Alternatives and construction predicates:  ^{ {<test>}? A | B }
	Construction predicates are used to conditionally determine which
constructs to add to a tree.  They are the complement to semantic predicates
and are a way of capturing semantics in a tree syntax.  In the fragment, if
<test> is true, A is added to the tree under construction; otherwise, B is
added.
5.)  Restart tree:  { => <fragment> }
	Erase the partial tree already specified.  [I originally used
"RETURN" for lack of a better idea; I changed this to "=>" as part of the
discussions Ter and I had on his proposal.]
6.)  "Copy" and "Reuse":  AST construction control
	Copy duplicates a node and its children; Reuse (in 2.8) incorporates
an input tree fragment into an output tree.  For 3.0, it would make sense to
use "Copy" to replicate a payload (to allow later editing), and Reuse to
allow more than one reference to a single payload.

2.8 tree construction is slightly awkward because of a need to "!" nodes
that would be used in tree actions.  Otherwise, links are added in via
normal construction and you can get circular linkages.  This should be fixed
in 3.0; an efficient implementation is pretty straightforward following what
I have done in 2.8 with the addition of the payload and carrier AST
decomposition.

My objections to Ter's proposal are broader than he mentions.  I would add
4.)  His => syntax cannot be mixed with the current ^ and ! annotation.
Frankly, I consider having two distinct tree rewrite syntaxes to be a
kludge, at best.
5.)  Ambiguities result from using his => in the presence of alternatives
within a rule.  This is especially true if the alternatives contain similar
sets of tokens.  One possible solution is to use labels on nodes, but then
alternatives have to have distinct labels for nodes.  One of the features
that was very handy in PCCTS was that you could have a rule fragment like 
( a:A | a:B) and then reference a in an action (for 3.0, you could also
reference attributes of a).

The major functional difference between the two approaches is that Ter's
approach does not allow inline tree construction.  [He also has not yet
included construction predicates, although I think that I have him almost
convinced that they are essential.]  The "cost" of this restriction (what
Ter refers to as "complexity" is the use of ^{ ... } to enclose rewrites.  I
consider this crippling, but Ter's argument is that this is an aid to
understanding.  [I did point out that the last time I heard him use that
argument, predicate hoisting disappeared from 2.0.  He agreed. (:=< ]
However, I think that the "simplicity" may be illusory:  consider

Foo :
    A B
    (C | D | E) => C A B | D A B | E A B ;

Is (C D E) => intended as a syntactic predicate or not? Probably not, but I
do not think that it would be hard to construct an example that was truly
ambiguous to the human eye.  Is it a rewrite or a typo?

One other point on using ^^ for rules and ^ for subrules.  I don't like
changing to ^^ for rules (neither did Ter when we talked--the
information-theoretic argument is that ^ should be used in the more common
case) and have come around to liking his original proposal that ^ is
restricted to a subrule when the subrule is labeled--that fits in well with
an explicit rewrite syntax because the label could then be used in a rewrite
statement.

--Loring

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Terence Parr
> Sent: Sunday, January 30, 2005 3:09 PM
> To: ANTLR Interest
> Subject: [antlr-interest] ANTLR 3.0 tree construction proposal
> 
> Howdy,
> 
> After a lot of typing (and I mean a lot), you'll see my ANTLR 3.0 tree
> construction proposal.  After the proposal, you'll see my long stream
> of consciousness as I wander through the design process (you can
> probably ignore that part).
> 
> http://www.antlr.org/blog/antlr3/trees.tml
> 
> I include a summary of the current opposition to this proposal.  So far
> I have one smart guy for (Sriram Srinivasan has seen an overview of
> this) and one smart guy against (Loring Craymer). ;)
> 
> There are lots of details that will get shaken out when I implement
> whatever we decide, but the broad strokes are there.
> 
> Whew...enough typing for a while.
> 
> Ter