[antlr-interest] ANTLR 3.0 tree construction proposal

Mon Jan 31 16:48:55 PST 2005

On Jan 30, 2005, at 9:50 PM, Loring Craymer wrote:
> Well, I do have to respond.  For those interested in working with an
> implementation of more complete functionality, I am hoping to release 
> 2.8
> this week.  Micheal Jor got the C# code generation working Saturday; 
> I'm
> folding in the 2.7.5 stuff now.  Open Channel has set up a website for 
> the
> release, but there are still some minor licensing issues to either fix 
> or
> live with.  This will be labeled a "beta" release; as soon as I get 
> that
> out, I can work with Marq and Wolfgang to get the python code 
> generation
> upgraded for tree rewriting.

Yes, some interesting work in Loring's 2.8experimental.  Note that we 
are exploring ideas in it not dictating the future direction. ;)

> 2.8 tree construction has the following features:
> 1.)  Tree "actions":  ^{ <tree fragment spec> }
> 	Within a tree action, AST structures are specified in a notation
> that mirrors tree walker syntax.  Tree actions support inline tree
> construction; the contents of a tree action are added to the current 
> tree
> under construction.  [As a side note, the tree action syntax is most 
> of the
> complexity to which Ter refers to in his article--he prefers 
> specifying the
> complete output.  More on that below.]

Yup.  I like the "match this and generate that" style like an output 
tree template rather than having to figure out the emergent behavior.  
The operators ^ and ! are like that--you have to figure out what it 
will build, but they are so damn terse and useful that I will leave 
them in.

I like separating concerns.  If you've seen my StringTemplate stuff 
you'll note that I find generating code with a bunch of print 
statements entangled with logic is unsatisfying.  The same is true of 
grammars.  I don't like mixing the tree generation in the recognition 
part, hence, I've tried to move everything to the end of the 
production.  The ^ and ! are so terse and effective that I'm willing to 
live with the entanglement there.

> 2.)  Tree constructors:  ^( <root> <body> )
> 	Tree constructors can nest, and are recognized by #( <root> <body> )
> constructs.

this makes that tree a child of the currently constructed tree for the 
rule, right?

> 3.)  Node constructors:  ^[<node description>]

Does this add a child to the current tree?  I guess so.

> 	Node constructors may appear either inside or outside a tree action
> and are the mechanism for adding "imaginary" nodes.  Outside a tree 
> action,
> node constructors may be suffixed with ^ (^[...]^) to make it the root 
> of
> the current rule.
> 4.)  Alternatives and construction predicates:  ^{ {<test>}? A | B }
> 	Construction predicates are used to conditionally determine which
> constructs to add to a tree.  They are the complement to semantic 
> predicates
> and are a way of capturing semantics in a tree syntax.  In the 
> fragment, if
> <test> is true, A is added to the tree under construction; otherwise, 
> B is
> added.

Yup...we need that.

> 5.)  Restart tree:  { => <fragment> }
> 	Erase the partial tree already specified.  [I originally used
> "RETURN" for lack of a better idea; I changed this to "=>" as part of 
> the
> discussions Ter and I had on his proposal.]
> 6.)  "Copy" and "Reuse":  AST construction control
> 	Copy duplicates a node and its children; Reuse (in 2.8) incorporates
> an input tree fragment into an output tree.  For 3.0, it would make 
> sense to
> use "Copy" to replicate a payload (to allow later editing), and Reuse 
> to
> allow more than one reference to a single payload.
>
> 2.8 tree construction is slightly awkward because of a need to "!" 
> nodes
> that would be used in tree actions.  Otherwise, links are added in via
> normal construction and you can get circular linkages.  This should be 
> fixed
> in 3.0; an efficient implementation is pretty straightforward 
> following what
> I have done in 2.8 with the addition of the payload and carrier AST
> decomposition.
>
>
> My objections to Ter's proposal are broader than he mentions.  I would 
> add
> 4.)  His => syntax cannot be mixed with the current ^ and ! annotation.

On purpose as this is one of the problems currently I feel.  The 
interactions of the two constructions cause confusion.

> Frankly, I consider having two distinct tree rewrite syntaxes to be a
> kludge, at best.

Noted, but I disagree.

> 5.)  Ambiguities result from using his => in the presence of 
> alternatives
> within a rule.

How so?

>  This is especially true if the alternatives contain similar
> sets of tokens.  One possible solution is to use labels on nodes, but 
> then
> alternatives have to have distinct labels for nodes.  One of the 
> features
> that was very handy in PCCTS was that you could have a rule fragment 
> like
> ( a:A | a:B) and then reference a in an action (for 3.0, you could also
> reference attributes of a).

Yes, you will be able to reuse a label.  Naturally you can have a => 
per alternative; shouldn't be a problem.

> The major functional difference between the two approaches is that 
> Ter's
> approach does not allow inline tree construction.

On purpose.

>  [He also has not yet
> included construction predicates, although I think that I have him 
> almost
> convinced that they are essential.]

Actually you'll find that in my proposal.  I recall suggesting the pred 
syntax to you back at the oregon meeting. ;)

>   The "cost" of this restriction (what
> Ter refers to as "complexity" is the use of ^{ ... } to enclose 
> rewrites.  I
> consider this crippling, but Ter's argument is that this is an aid to
> understanding.

Until I see an example that I cannot handle that is real, you've not 
convinced me that we need the extra functionality.

>  [I did point out that the last time I heard him use that
> argument, predicate hoisting disappeared from 2.0.  He agreed. (:=< ]

I didn't have time to implement the predicates and used people's 
confusion as an excuse ;)  I look back and am amazed I got antlr out at 
all during my start up days...

> However, I think that the "simplicity" may be illusory:  consider
>
> Foo :
>     A B
>     (C | D | E) => C A B | D A B | E A B ;
>
> Is (C D E) => intended as a syntactic predicate or not? Probably not, 
> but I
> do not think that it would be hard to construct an example that was 
> truly
> ambiguous to the human eye.  Is it a rewrite or a typo?

I'm using -> now.  This was never intended as a predicate.  The stuff 
to the right is generated not recognized.  There is no such thing as 
pipe | on the right hand side.   That example doesn't make sense in my 
world.  You could do this though:

foo : A B (a=C | a=D | a=E) -> ^(@a A B) ;

which I think is pretty clear.  (note @a is an attribute reference and 
intentionally make them standout from rule element references).  You 
are labeling what is matched in the subrule.

We can explore labeling the entire subrule as well.  No reason not to 
allow

foo : A B a=(C|D|E) -> ^(@a A B) ;

I was holding off on syntactic predicates in the new version to see 
what LL(*) would do for us.  I truly believe in an evolutionary 
approach to building software.  Build only what you need, when you need 
it.

> One other point on using ^^ for rules and ^ for subrules.  I don't like
> changing to ^^ for rules (neither did Ter when we talked--the
> information-theoretic argument is that ^ should be used in the more 
> common
> case)

I don't like the backward incompatibility, but ^ seems more "local" 
than ^^ so I think we should use ^^ to mean "root of entire subtree for 
rule", though it make take a few more char in general.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com