[antlr-interest] Re: Translators Should Use Tree Grammars

Sat Nov 20 19:14:58 PST 2004

--- In antlr-interest at yahoogroups.com, "atripp54321" <atripp at c...> wrote:

> Take my whole JavaEmitter thing. If that's so easy to do
> with a treewalker, why hasn't anyone done it, either in
> the years before I wrote mine or since then? It only
> took me a day or two, can't someone take a day and write a treewalker
> to emit Java code, so we can compare the two?

Google for "jparse". This project has an example of the hetero-node
approach to pretty printing using a treewalker.

> > First of all, your "tree library" description and examples are exactly
> > the sort of thing that one does with a tree walker and cannot be done
> > generically by hand (declarations have syntax and are identified by
> > subtree structure--syntax and semantics vary across languages, so
> > defining a canonical structure can be difficult).
> 
> Could you give an example of something that 
> you think can't be done "by hand"?

It isn't so much it can't be done for a particular case, more like it
can't be [easily] done as a generic set of library routines for all cases.

> > "getAllGlobalVars()" is easily done as a tree walker pass; I do that
> > sort of thing as a matter of routine unless I have to worry about
> > speed, and I'm hardly alone in that. 
>  
> It's easily done by hand, too, without even having to know
> any grammar syntax.

You have to know the tree structure. That is what the grammar encodes.

> > The renaming is then just a
> > second pass.  Your "bottom up" algorithm is messier (findDeclaration()
> > will be pretty ugly) and will be slower than the approach of
> > constructing symbol tables during a tree walk and then checking
> > VAR_REFS against the local and global symbol tables as they are
> > encountered.
> 
> I agree that it will be slower in general, but suppose you
> are applying hundreds of changes, any one of which can
> affect the symbol table. You'd need to either keep re-generating
> the symbol table (which would be slow), or write lots of
> code that keeps the symbol table up to date with the numerous
> AST changes.

Or, you can design the symbol table to be updatable. You change the
affected nodes and ask the symbol table to update the relevant entry.

> The general principle is "don't duplicate data";
> that the symbol table contains a subset of
> data from the AST...it only exists for efficiency. Accessing
> the AST directly rather than using a symbol table
> would (IMO) require less code. Whether you need the
> added speed of the symbol table depends on the application.

The symbol table stores AST node attributes in a format that is
especially fast to query. There isn't any duplication really.

> > I don't understand your TreeStack.  It sort of sounds like an
> > attribute stack, but not quite.
> 
> Just saying that you need to keep "state" information as you
> walk the tree. For example, are we currently inside a function
> definition or not? Is this the first reference to this variable?
> etc.

Use the [symbol] table Luke ;-)

> So then what about the specific problems I pointed out
> under "Comparing approaches by analyzing ease of change"?
> For example, what about the spacing issue I mention, and 
> how would you change the ordering of printing of AST children?
> I'm not saying it can't be done with a treewalker, I'm just
> saying I think it will require some real thought, whereas
> with vanilla Java code, these things are trivial.

See my earlier response to your paper. The code might look something like:

rootID
  :  #( ROOT_ID ......
         ( ....... package { output.write("\n\n\n\"); } ....... )*
     )
  ;

> > In your "Limitations" example, you manually match a subtree and decide
> > how to print accordingly.  The equivalent tree snippet is
something like
> > 
> > ad :
> >     #( ARRAY_DECLARATOR
> >         ( { out.print("["); } e:EXPR { out.print(e.toString() +
"]"); }
> >         | i:ID { out.print(#i.toString() + "[]"); }
> >         )
> >     )
> >     ;
> 
> Gack! So for comparison, here's my equivalent non-treewalker code:
> case ARRAY_DECLARATOR:
>     if (child1.getType() == EXPR) {
>         out.print("[");
>         print(child1);
>         out.print("]");
>     }
>     else {
>         print(child1);
>         out.print("[]");
>     }
>     break;
> 
> 99.9% of Java developers (i.e. the non-ANTLR experts) will find
> the second one easier to write, to read, and to maintain.

Andy this is a fallacious argument. You could say the same of lexer or
parser rules yet most Java developers would rather use a lexer/parser
generator than code them directly in Java.

Grammar tools are...well...tools just like Java. We need to learn how
to use them effectively. Lisp programmers would find the Java code
similarly alien but that doesn't distract from it's usefulness to you
or other java programmers.

In any case, as a Java & ANTLR user, I find the grammar snippet much
more useful and easier/cheaper to maintain. Both are equally readable
to me.

> Try to see that treewalker code through the eyes of
> someone who doesn't know ANTLR or is just an ANTLR newbie.

I'd say such a person simply needs more training about ANTLR. The
question is determining what resources would be most useful i.e.
finding an engineering solution to the production of ANTLR
documentation. Now if only there was a documentation generator grammar
and oracle  ;-)

> > Note that you can put actions in reasonable places--they won't execute
> > unless matching is successful.
> > 
> > In short, I think that you still haven't yet grokked the power of
trees.
> 
> Yes, probably not. I'm sure they're quite powerful, but I'm more
> concerned with ease-of-development for ANTLR newbies and
> those who don't know ANTLR at all (and don't really care to learn it).

It sounds like you are describing people who are happy to do invest
the resources needed to be able to use Java/C#/C++/Python etc
effectively but, are unwilling to do so for ANTLR. I'm not sure there
is any way to help such people short of urging them to learn how to
use ANTLR. Is there?

As for ease of development, just think of the benefits of specifying
lexer and parser using grammars. You get similar benefits for
post-parse analysis and transformation passes from using tree grammars.

Cheers,

Micheal
ANTLR/C#

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/