[antlr-interest] Re: Translators Should Use Tree Grammars

Sat Nov 20 15:13:14 PST 2004

--- In antlr-interest at yahoogroups.com, "atripp54321" <atripp at c...> wrote:
> 
> --- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> > 
> > Well, I hassled Ter about his under-analyzing what could and could not
> > be done with visitors, so I feel perfectly free to complain that you
> > have mis-analyzed what can and can't be done with trees.
> 
> I tried not to say that certain things can't be done with treewalkers.
> I explained how I did them without a treewalker and said it
> didn't look (to me) to be as easy with a treewalker.
> I was hoping someone would take my examples, show how they
> can be done with a treewalker, so that we can compare the two.
> 
> Take my whole JavaEmitter thing. If that's so easy to do
> with a treewalker, why hasn't anyone done it, either in
> the years before I wrote mine or since then? It only
> took me a day or two, can't someone take a day and write a treewalker
> to emit Java code, so we can compare the two?

Google doesn't exactly find your JavaEmitter in widespread use, and
the documentation is too sparse to make it clear why you would want to
use it as a Java generator.  What does the generated Java code do?  As
to the AST pretty-printing:  it is easy enough to write a
pretty-printing visitor.  I don't use them, but that is mainly because
a generated tree grammar does the same thing and is a better basis for
iterating tree structure with.

> > 
> > First of all, your "tree library" description and examples are exactly
> > the sort of thing that one does with a tree walker and cannot be done
> > generically by hand (declarations have syntax and are identified by
> > subtree structure--syntax and semantics vary across languages, so
> > defining a canonical structure can be difficult).
> 
> Could you give an example of something that 
> you think can't be done "by hand"?

It is not that it cannot be done by hand--it cannot be done
generically, a key requirement for a library function.

> > "getAllGlobalVars()" is easily done as a tree walker pass; I do that
> > sort of thing as a matter of routine unless I have to worry about
> > speed, and I'm hardly alone in that. 
>  
> It's easily done by hand, too, without even having to know
> any grammar syntax.

Not true; you have to know the form and type content of the subtree
you are using to identify a variable.  That is syntax.

> > The renaming is then just a
> > second pass.  Your "bottom up" algorithm is messier (findDeclaration()
> > will be pretty ugly) and will be slower than the approach of
> > constructing symbol tables during a tree walk and then checking
> > VAR_REFS against the local and global symbol tables as they are
> > encountered.
> 
> I agree that it will be slower in general, but suppose you
> are applying hundreds of changes, any one of which can
> affect the symbol table. You'd need to either keep re-generating
> the symbol table (which would be slow), or write lots of
> code that keeps the symbol table up to date with the numerous
> AST changes.

The performance gap then widens considerably--symbol table
construction during a treewalk is a dynamic process, but all it really
amounts to is inserting data structures in a hash table (possibly with
additional support for removing local variables en masse) and either
replacing the data structure contents as needed or deleting and
re-inserting a specific element.  Symbol tables are a handy tool, and
are quite efficient when used correctly.

> The general principle is "don't duplicate data";
> that the symbol table contains a subset of
> data from the AST...it only exists for efficiency. Accessing
> the AST directly rather than using a symbol table
> would (IMO) require less code. Whether you need the
> added speed of the symbol table depends on the application.

This is sometimes true, but it is more often true that symbol tables
incorporate a semantic interpretation of parts of the AST.  Take a
global variable, for example:  to construct a symbol, you have to
decode type, name, and other properties from the AST.  The only
reduction in code from using the AST directly is entry into the symbol
table and some of the symbol construction (packaging, not analysis). 
The penalty for not doing that is to have the analysis code repeated
multiple times.

> > 
> > I don't understand your TreeStack.  It sort of sounds like an
> > attribute stack, but not quite.
> 
> Just saying that you need to keep "state" information as you
> walk the tree. For example, are we currently inside a function
> definition or not? Is this the first reference to this variable?
> etc.
> 
> > 
> > As to the multiple passes:  multiple tree walker passes are the norm,
> > not something that is difficult to do.  That is one of the real
> > strengths of ANTLR syntax trees, and is a feature that separates ANTLR
> > from translator generators that use parse trees.  The first
> > applications of SORCERER involved a 13-pass FORTRAN translator that
> > Ter did and a 7-pass Pascal-to-Ada translator by Gary Funck.
> 
> OK.
> 
> > 
> > Your pretty-printer example is the sort of thing that can be done with
> > either a visitor or tree walker without much trouble; in the latter
> > case, I would usually either factor out the sets of alternative node
> > types so that the actions are not much different than your case
> > statement and are clustered in only a few rules or use a heteroAST
> > with type-specific toString() variants.
> 
> So then what about the specific problems I pointed out
> under "Comparing approaches by analyzing ease of change"?
> For example, what about the spacing issue I mention, and 
> how would you change the ordering of printing of AST children?
> I'm not saying it can't be done with a treewalker, I'm just
> saying I think it will require some real thought, whereas
> with vanilla Java code, these things are trivial.

Tree grammars are easier to change:  the action additions are
equivalent to the code additions in your case statements.  The
specific example you give can be handled by building up strings for
printing (something that 3.0 will automate), by doing separate walks
over subtrees from within actions, or by further transformation.

Solving these types of problem is easy:  the issue with printing is
almost always a matter of fidding with the details until the output
"looks good" (esthetics rather than implementation).

> > 
> > In your "Limitations" example, you manually match a subtree and decide
> > how to print accordingly.  The equivalent tree snippet is
something like
> > 
> > ad :
> >     #( ARRAY_DECLARATOR
> >         ( { out.print("["); } e:EXPR { out.print(e.toString() +
"]"); }
> >         | i:ID { out.print(#i.toString() + "[]"); }
> >         )
> >     )
> >     ;
> 
> Gack! So for comparison, here's my equivalent non-treewalker code:
> case ARRAY_DECLARATOR:
>     if (child1.getType() == EXPR) {
>         out.print("[");
>         print(child1);
>         out.print("]");
>     }
>     else {
>         print(child1);
>         out.print("[]");
>     }
>     break;
> 
> 99.9% of Java developers (i.e. the non-ANTLR experts) will find
> the second one easier to write, to read, and to maintain.
> Try to see that treewalker code through the eyes of
> someone who doesn't know ANTLR or is just an ANTLR newbie.

Well, I could have formatted the tree snippet for readability instead
of compactness--this is a fairly long post.  And I probably should not
have included the toString()'s where you omitted them--that does
reduce clarity.  And were this not a toy example, EXPR would be a
subtree.  None of that is particularly relevant; it strays from your
original claim that this is difficult to do with tree grammars when in
fact it is easy.

> > 
> > Note that you can put actions in reasonable places--they won't execute
> > unless matching is successful.
> > 
> > In short, I think that you still haven't yet grokked the power of
trees.
> 
> Yes, probably not. I'm sure they're quite powerful, but I'm more
> concerned with ease-of-development for ANTLR newbies and
> those who don't know ANTLR at all (and don't really care to learn it).

Trees and tree grammars are exceedingly powerful and easy to use
(especially with automated tree generation, coming soon with 2.8). 
However, they do seem esoteric at first.  That is largely due to
habituation to procedural languages; once you shift to a syntax-driven
language paradigm (ANTLR and other BNF and EBNF language recognizers),
then trees are a natural next step.  Because of the difficulties in
making this paradigm shift, newbies usually don't use trees unless
they already have a language translation background.  Once you have
made that shift, though, trees and tree grammars are an essential part
of ANTLR usage.

--Loring

> Andy

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/