[antlr-interest] Re: Translators Should Use Tree Grammars

Sat Nov 20 10:57:29 PST 2004

--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> 
> Well, I hassled Ter about his under-analyzing what could and could not
> be done with visitors, so I feel perfectly free to complain that you
> have mis-analyzed what can and can't be done with trees.

I tried not to say that certain things can't be done with treewalkers.
I explained how I did them without a treewalker and said it
didn't look (to me) to be as easy with a treewalker.
I was hoping someone would take my examples, show how they
can be done with a treewalker, so that we can compare the two.

Take my whole JavaEmitter thing. If that's so easy to do
with a treewalker, why hasn't anyone done it, either in
the years before I wrote mine or since then? It only
took me a day or two, can't someone take a day and write a treewalker
to emit Java code, so we can compare the two?

> 
> First of all, your "tree library" description and examples are exactly
> the sort of thing that one does with a tree walker and cannot be done
> generically by hand (declarations have syntax and are identified by
> subtree structure--syntax and semantics vary across languages, so
> defining a canonical structure can be difficult).

Could you give an example of something that 
you think can't be done "by hand"?

> "getAllGlobalVars()" is easily done as a tree walker pass; I do that
> sort of thing as a matter of routine unless I have to worry about
> speed, and I'm hardly alone in that. 

It's easily done by hand, too, without even having to know
any grammar syntax.

> The renaming is then just a
> second pass.  Your "bottom up" algorithm is messier (findDeclaration()
> will be pretty ugly) and will be slower than the approach of
> constructing symbol tables during a tree walk and then checking
> VAR_REFS against the local and global symbol tables as they are
> encountered.

I agree that it will be slower in general, but suppose you
are applying hundreds of changes, any one of which can
affect the symbol table. You'd need to either keep re-generating
the symbol table (which would be slow), or write lots of
code that keeps the symbol table up to date with the numerous
AST changes.

The general principle is "don't duplicate data";
that the symbol table contains a subset of
data from the AST...it only exists for efficiency. Accessing
the AST directly rather than using a symbol table
would (IMO) require less code. Whether you need the
added speed of the symbol table depends on the application.

> 
> I don't understand your TreeStack.  It sort of sounds like an
> attribute stack, but not quite.

Just saying that you need to keep "state" information as you
walk the tree. For example, are we currently inside a function
definition or not? Is this the first reference to this variable?
etc.

> 
> As to the multiple passes:  multiple tree walker passes are the norm,
> not something that is difficult to do.  That is one of the real
> strengths of ANTLR syntax trees, and is a feature that separates ANTLR
> from translator generators that use parse trees.  The first
> applications of SORCERER involved a 13-pass FORTRAN translator that
> Ter did and a 7-pass Pascal-to-Ada translator by Gary Funck.

OK.

> 
> Your pretty-printer example is the sort of thing that can be done with
> either a visitor or tree walker without much trouble; in the latter
> case, I would usually either factor out the sets of alternative node
> types so that the actions are not much different than your case
> statement and are clustered in only a few rules or use a heteroAST
> with type-specific toString() variants.

So then what about the specific problems I pointed out
under "Comparing approaches by analyzing ease of change"?
For example, what about the spacing issue I mention, and 
how would you change the ordering of printing of AST children?
I'm not saying it can't be done with a treewalker, I'm just
saying I think it will require some real thought, whereas
with vanilla Java code, these things are trivial.

> 
> In your "Limitations" example, you manually match a subtree and decide
> how to print accordingly.  The equivalent tree snippet is something like
> 
> ad :
>     #( ARRAY_DECLARATOR
>         ( { out.print("["); } e:EXPR { out.print(e.toString() + "]"); }
>         | i:ID { out.print(#i.toString() + "[]"); }
>         )
>     )
>     ;

Gack! So for comparison, here's my equivalent non-treewalker code:
case ARRAY_DECLARATOR:
    if (child1.getType() == EXPR) {
        out.print("[");
        print(child1);
        out.print("]");
    }
    else {
        print(child1);
        out.print("[]");
    }
    break;

99.9% of Java developers (i.e. the non-ANTLR experts) will find
the second one easier to write, to read, and to maintain.
Try to see that treewalker code through the eyes of
someone who doesn't know ANTLR or is just an ANTLR newbie.

> 
> Note that you can put actions in reasonable places--they won't execute
> unless matching is successful.
> 
> In short, I think that you still haven't yet grokked the power of trees.

Yes, probably not. I'm sure they're quite powerful, but I'm more
concerned with ease-of-development for ANTLR newbies and
those who don't know ANTLR at all (and don't really care to learn it).

Andy

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/