[antlr-interest] Please help me with homogeneous versusheterogeneous trees!

Alexey Demakov demakov at ispras.ru
Thu Jul 28 23:08:43 PDT 2005


----- Original Message ----- 
From: <cflowers at mindspring.com>
To: <antlr-interest at antlr.org>
Sent: Thursday, July 28, 2005 9:46 PM
Subject: [antlr-interest] Please help me with homogeneous versusheterogeneous trees!


> Really good Antlr people,
>
> I am struggling with the design decision of whether to use a heterogeneous vs homogeneous AST. I would love some advice from
people who've been there and done that.
>
> First, let me make this clear ... I agree 100% with what Terrance says about not putting *actions* into strongly typed node
objects (see http://www.antlr.org/blog/antlr3/trees.tml and search for "Heterogeneous Nodes"). If you do that, and if you need more
than one pass, then you've got code for 5 different passes all sliced up and scattered around each of your node classes. Yuck. So I
am not asking about that aspect of the decision.

There are design patterns that allow to have actions separate from tree structure.

> But I am struggling with the concept of *naming the relationships* between nodes. After all, the AST is supposed to be the
"canonical representation" of what the source code says, right? It is meant to be simplified, robust against grammar changes, and
convenient to understand (right?). So, if I think about how I'd design a tree with all that in mind, I start to picture nodes with
named relationships between them. But that starts taking me in the direction of heterogeneous trees.

Exactly. AST description is a contract between all subsystems of tool. It allows to work with ANTLR only those developers who
implement parser. While other subsystems know only about AST structure. And AST can be created not only from input text, but from
other sources.

> Here's an example. Say I parse a SQL SELECT statement (which I happen to be doing). I use Antlr's great tree operators, and a tree
grammar to do one pass, and here's part of the tree that I wind up with:
>
>      SELECT
>        |
>        |
>      FIELD ("sls_dt")
>      |       |
>      |       |
>     ID("a")  ID("Sales_History")
>
> As you can see, a FIELD node has an alias (that's what the first child is) and a table name (that's what the second child is).
Now, here's the question. I can look at the tree, and using my own knowledge of SQL, I can see that the first child is the alias and
the second child is the table name. But the tree doesn't *say* that.
>
> But shouldn't a "canonical representation" of the meaning of the input code go ahead and spell such things out? In other words,
can we say that the next tree is objectively better for most purposes?
>
>      SELECT
>        |
>        |
>      FIELD ("sls_dt")
>      |       |
>      |       |
>    ALIAS   TABLENAME
>      |       |
>      |       |
>     ID("a")  ID("Sales_History")
>
> And, if that is an improvement, then the next question is, why don't I introduce a specific node class, called FieldNode, which
has a method called getAlias() and another method called getTableName()? Then, my node class takes care of naming the relationships
for me. Then, I don't need the 2 imaginary nodes.
>
> I've seen this approach discussed in detail in books by John Gough and Pat Terry, both of whom have taught me a lot through their
writings. But I haven't been able to find a detailed discussion of the *other* (all homogeneous) approach.
>
> How have you long-time antlr users come to approach this, and why? And any pointers to materials that discuss this topic would be
greatly appreciated.
>
> Thanks!
> Charlie Flowers
> Atlanta, GA

Several monthes ago in this list there was the discussion about hetero vs home trees between Terrence, me and others.
You can find it in the archive of the list.

My main objection against homo trees and tree grammars was that homo trees force developers to use tree grammars.
But ANTLR require to learn its language, and it is not easy.
Terrence asked that Visitor pattern usually used to separate actions from AST description
has some drawbacks and I agreed. Visitors don't use any tool support and this is no surprise they lose to ANTLR.
But hetero tree approach with good tool support can overcome ANTLR homo trees.
There is Terrence's article written after this discussion:
http://www.antlr.org/article/1100569809276/use.tree.grammars.tml

Some time ago I've developed TreeDL - it is language and tool for heterogenous AST description.
After discussing hetero vs homo tree approach and this article:
http://www.southern-storm.com.au/treecc_essay.html
I've desided to add in TreeDL operations on trees similar to described in this article
and implemented in treecc tool http://www.southern-storm.com.au/treecc.html

I plan to release TreeDL with full documentation in August.
But if you interested, you can download beta version of TreeDL 2 from
http://sf.net/projects/treedl
with Java5 parser and AST description as example.
Of course TreeDL tool is implemented using TreeDL description of AST. :)

TreeDL homepage http://treedl.sf.net now contains outdated documentation for TreeDL 1 - without operations.
TreeDL 2 language description docbook source can be found in CVS repository:
http://cvs.sourceforge.net/viewcvs.py/treedl/TreeDL/treedl/src/docbook/
Write me if you need html or pdf version.

Small example: your SQL SELECT statement structure in TreeDL:

module SQL;

node SELECT
{
    child FIELD field;
}

node FIELD
{
    child ID alias;
    child ID tableName;
}

child ID
{
    attribute string text;
}

Generated code (in Java):

public class SQL
{
    public static class SELECT
    {
        // accessors. exactly as you wish
        public FIELD getField() ...
        public void setField( FIELD field ) ...
        // visitor pattern support if you need
        public void accept( Visitor visitor ) ...
    }
    public static class FIELD
    {
        public ID getAlias() ...
        public void setAlias( ID alias ) ...
        public ID getTableName() ...
        public void setTableName( ID tableName ) ...
        public void accept( Visitor visitor ) ...
    }
    public static class ID
    {
        public String getText() ...
        public void setText() ...
        public void accept( Visitor visitor ) ...
    }
    // visitor pattern support if you need
    public interface SQL_Visitor extends Visitor
    {
        public void visitSELECT( SELECT node ) ...
        public void visitFIELD( FIELD node ) ...
        public void visitID( ID node ) ...
    }
}

Also can be generated:
1. Empty and copy visitors.
2. Depth-first tree walker with action calls before and after subtree walking.
3. Tree node factory.

And you can write rather simple plugin to generate what you want. :)

Regards,
Alexey

-----
Alexey Demakov
TreeDL: Tree Description Language: http://treedl.sourceforge.net
RedVerst Group: http://www.unitesk.com




More information about the antlr-interest mailing list