[antlr-interest] Re: Please help me with homogeneous versus heterogeneous trees!

Terence Parr parrt at cs.usfca.edu
Sun Jul 31 01:36:41 PDT 2005


On Jul 30, 2005, at 5:20 AM, cflowers at mindspring.com wrote:


> Alexey,
>
> Thanks a ton for your response. I read with great interest the  
> links you sent me. I can see how maintaining a compiler over a long  
> term would be much easier with TreeDL.
>
> It is also clear that you vote strongly in favor of heterogeneous  
> trees that name relationships. Can anyone who holds the opposite  
> view share some thoughts? (I will search the archives later too,  
> but right now the corporate firewall won't let me hit web pages  
> that aren't from port 80).
>

Hi.   This article pretty much summarizes my experience and  
intuition, which is not perfect but I think I've learned a trick or  
two over the past 20 years building these suckers:

http://www.antlr.org/article/1100569809276/use.tree.grammars.tml


> I am not building a compiler. Rather, I envision a number of C#  
> utilities, all of which might be based off of the same AST.
>

This will be a problem I'd say.  You'll need to build a new method in  
each tree node object for each new utility, possibly screwing up the  
other utilities.  Very poor separation in my opinion.  I like  
encapsulating all functionality for a tree walk in place rather than  
over many many files and entangled with the walks for other utilities  
or phases.  With a tree grammar, you'll never touch the tree  
descriptions  and can't possibly mess up the other utilities.


> So I am thinking of designing a general purpose AST that will  
> become the contract all these other tools adhere to. And it is  
> still a tough question ... homogeneous trees seem to not be  
> "precise" enough (or self-documenting enough") to truly express  
> what the input source says, because they don't name relationships.
>

Structure *is* the relationship.  And the grammatical structure  
exists whether you think of it that way or not.  Further, the  
structure *can* be named with a grammar rule, which is target  
language independent.  For example, 3+4 represented as

   +
  /  \
3  4

is clear to anybody regardless of the implementation language,  
right?  "+" over two operands is a structure.  Usually we keep the  
operation at the root so the generalized structure is op over  
operands.  Don't need no stinkin' OO for that. ;)


> So, what have you homogeneous folks learned through your experience  
> that mitigates this itch for explicitly named relationships? Or  
> compensates for the lack of named relationships?
>

The grammar

add : #(PLUS INT INT) ;

or

expr : #(PLUS expr expr)
      | INT
      ;

makes this relationship explicit and also gives it a name.

Remember any non random sequence of stuff conforms to a language.  I  
don't need to convince anybody on this list that grammars are useful  
for describing languages I hope. ;)

Hetero trees are fine for one phase translations.  I think co-opting  
the type system to enforce and describe structure is a nice trick but  
it doesn't scale well per my arguments in the article.  For example,  
the same tree structure can be walked by a tree parser in any target  
language (for which you have a target) whereas using a type system  
locks you into that language.  For example, C just doesn't have the  
type system to handle what an OO language can do.

For the next 5 years (after finishing v3), I have committed myself to  
educating programmers by writing articles, academic papers, and books  
on recognition and translation.  part of the problem with tree  
grammars has been the lousy ANTLR 2.x implementation and my lack of  
'marketing' and examples etc...

In summary, Alexey is clearly a very smart guy and has a nice tool.   
SableCC also uses this approach, written by another smart guy. :)   
Using the type system is nice for many applications.  They are  
obvious to the casual observer and work great for many simple  
translation, though you have to write the tree recognizer by hand  
(<shudder>).  For larger problems, I hope over the next n years to  
convince you that tree grammars are the way to recognize  
structure. :)  Give me a chance to finish v3 and then watch the  
torrent of doc/articles/papers.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com




More information about the antlr-interest mailing list