[antlr-interest] Re: Translators Should Use Tree Grammars

Sun Nov 21 10:16:20 PST 2004

--- In antlr-interest at yahoogroups.com, "micheal_jor" <open.zone at v...>
wrote:
> 
> --- In antlr-interest at yahoogroups.com, "atripp54321" <atripp at c...>
wrote:
> 
> > Take my whole JavaEmitter thing. If that's so easy to do
> > with a treewalker, why hasn't anyone done it, either in
> > the years before I wrote mine or since then? It only
> > took me a day or two, can't someone take a day and write a treewalker
> > to emit Java code, so we can compare the two?
> 
> Google for "jparse". This project has an example of the hetero-node
> approach to pretty printing using a treewalker.
> 
> > > First of all, your "tree library" description and examples are
exactly
> > > the sort of thing that one does with a tree walker and cannot be
done
> > > generically by hand (declarations have syntax and are identified by
> > > subtree structure--syntax and semantics vary across languages, so
> > > defining a canonical structure can be difficult).
> > 
> > Could you give an example of something that 
> > you think can't be done "by hand"?
> 
> It isn't so much it can't be done for a particular case, more like it
> can't be [easily] done as a generic set of library routines for all
cases.

I'll propose a specific tree-search-and-manipulation library
and show how it could be extended for typical AST usage.

> 
> 
> > > "getAllGlobalVars()" is easily done as a tree walker pass; I do that
> > > sort of thing as a matter of routine unless I have to worry about
> > > speed, and I'm hardly alone in that. 
> >  
> > It's easily done by hand, too, without even having to know
> > any grammar syntax.
> 
> You have to know the tree structure. That is what the grammar encodes.

Yes, you have to know the tree structure in any case, it's
the ANTLR grammar syntax that can be avoided.

> 
> > > The renaming is then just a
> > > second pass.  Your "bottom up" algorithm is messier
(findDeclaration()
> > > will be pretty ugly) and will be slower than the approach of
> > > constructing symbol tables during a tree walk and then checking
> > > VAR_REFS against the local and global symbol tables as they are
> > > encountered.
> > 
> > I agree that it will be slower in general, but suppose you
> > are applying hundreds of changes, any one of which can
> > affect the symbol table. You'd need to either keep re-generating
> > the symbol table (which would be slow), or write lots of
> > code that keeps the symbol table up to date with the numerous
> > AST changes.
> 
> Or, you can design the symbol table to be updatable. You change the
> affected nodes and ask the symbol table to update the relevant entry.

For any variable declaration, I can:
change the variable name
change the type
add and delete modifiers
move the declaration to another place in the tree, changing
  it from local to global or global to local
remove the declaration
move the declaration to a different AST

That's just for variable declarations off the top of my head.
The code would have to maintain both the AST and the symbol table.
That's a lot of duplicate work.

> 
> > The general principle is "don't duplicate data";
> > that the symbol table contains a subset of
> > data from the AST...it only exists for efficiency. Accessing
> > the AST directly rather than using a symbol table
> > would (IMO) require less code. Whether you need the
> > added speed of the symbol table depends on the application.
> 
> The symbol table stores AST node attributes in a format that is
> especially fast to query. There isn't any duplication really.

It *is* duplication of data...the symbol table can be regenerated from
the AST at any time.

> 
> > > I don't understand your TreeStack.  It sort of sounds like an
> > > attribute stack, but not quite.
> > 
> > Just saying that you need to keep "state" information as you
> > walk the tree. For example, are we currently inside a function
> > definition or not? Is this the first reference to this variable?
> > etc.
> 
> Use the [symbol] table Luke ;-)

Whether that information is combined with the symbol table or not,
it's still a mess.

> 
> > So then what about the specific problems I pointed out
> > under "Comparing approaches by analyzing ease of change"?
> > For example, what about the spacing issue I mention, and 
> > how would you change the ordering of printing of AST children?
> > I'm not saying it can't be done with a treewalker, I'm just
> > saying I think it will require some real thought, whereas
> > with vanilla Java code, these things are trivial.
> 
> See my earlier response to your paper. The code might look something
like:
> 
> rootID
>   :  #( ROOT_ID ......
>          ( ....... package { output.write("\n\n\n\"); } ....... )*
>      )
>   ;
> 
> > > In your "Limitations" example, you manually match a subtree and
decide
> > > how to print accordingly.  The equivalent tree snippet is
> something like
> > > 
> > > ad :
> > >     #( ARRAY_DECLARATOR
> > >         ( { out.print("["); } e:EXPR { out.print(e.toString() +
> "]"); }
> > >         | i:ID { out.print(#i.toString() + "[]"); }
> > >         )
> > >     )
> > >     ;
> > 
> > Gack! So for comparison, here's my equivalent non-treewalker code:
> > case ARRAY_DECLARATOR:
> >     if (child1.getType() == EXPR) {
> >         out.print("[");
> >         print(child1);
> >         out.print("]");
> >     }
> >     else {
> >         print(child1);
> >         out.print("[]");
> >     }
> >     break;
> > 
> > 99.9% of Java developers (i.e. the non-ANTLR experts) will find
> > the second one easier to write, to read, and to maintain.
> 
> Andy this is a fallacious argument. You could say the same of lexer or
> parser rules yet most Java developers would rather use a lexer/parser
> generator than code them directly in Java.

I can use ANTLR and the grammars that come with it without
understanding the grammar syntax at all. In that way, I've
got a lexer and parser without hand coding and without
knowing ANTLR syntax. I'd like to be able to work with
ASTs without knowing ANTLR syntax too.

(In fact I do know ANTLR syntax, but you get the point).

> 
> Grammar tools are...well...tools just like Java. We need to learn how
> to use them effectively. Lisp programmers would find the Java code
> similarly alien but that doesn't distract from it's usefulness to you
> or other java programmers.
> 
> In any case, as a Java & ANTLR user, I find the grammar snippet much
> more useful and easier/cheaper to maintain. Both are equally readable
> to me.

As a hard-core Java programmer, but just a casual ANTLR user,
I don't want to get mired in ANTLR syntax for what amounts
to basic tree data structure stuff.

> 
> > Try to see that treewalker code through the eyes of
> > someone who doesn't know ANTLR or is just an ANTLR newbie.
> 
> I'd say such a person simply needs more training about ANTLR. The
> question is determining what resources would be most useful i.e.
> finding an engineering solution to the production of ANTLR
> documentation. Now if only there was a documentation generator grammar
> and oracle  ;-)

More ANTLR training would help, but I still think ANTLR
tree grammars are the wrong tool for tree-data-structure
search and manipulation.

> 
> > > Note that you can put actions in reasonable places--they won't
execute
> > > unless matching is successful.
> > > 
> > > In short, I think that you still haven't yet grokked the power of
> trees.
> > 
> > Yes, probably not. I'm sure they're quite powerful, but I'm more
> > concerned with ease-of-development for ANTLR newbies and
> > those who don't know ANTLR at all (and don't really care to learn it).
> 
> It sounds like you are describing people who are happy to do invest
> the resources needed to be able to use Java/C#/C++/Python etc
> effectively but, are unwilling to do so for ANTLR. I'm not sure there
> is any way to help such people short of urging them to learn how to
> use ANTLR. Is there?

Yes! Write your apps using just the Java/C#/C++ language that
they already know.

> 
> 
> As for ease of development, just think of the benefits of specifying
> lexer and parser using grammars. You get similar benefits for
> post-parse analysis and transformation passes from using tree grammars.

See my earlier comments: a BNF-like grammar is the best way
to specify lexer and parser functionality. It's not the best
way to specify translator functionality, IMHO.

> 
> Cheers,
> 
> Micheal
> ANTLR/C#

Andy

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/