[antlr-interest] Re: Please help me with homogeneous versus
parrt at cs.usfca.edu
Sun Jul 31 01:36:41 PDT 2005
On Jul 30, 2005, at 5:20 AM, cflowers at mindspring.com wrote:
> Thanks a ton for your response. I read with great interest the
> links you sent me. I can see how maintaining a compiler over a long
> term would be much easier with TreeDL.
> It is also clear that you vote strongly in favor of heterogeneous
> trees that name relationships. Can anyone who holds the opposite
> view share some thoughts? (I will search the archives later too,
> but right now the corporate firewall won't let me hit web pages
> that aren't from port 80).
Hi. This article pretty much summarizes my experience and
intuition, which is not perfect but I think I've learned a trick or
two over the past 20 years building these suckers:
> I am not building a compiler. Rather, I envision a number of C#
> utilities, all of which might be based off of the same AST.
This will be a problem I'd say. You'll need to build a new method in
each tree node object for each new utility, possibly screwing up the
other utilities. Very poor separation in my opinion. I like
encapsulating all functionality for a tree walk in place rather than
over many many files and entangled with the walks for other utilities
or phases. With a tree grammar, you'll never touch the tree
descriptions and can't possibly mess up the other utilities.
> So I am thinking of designing a general purpose AST that will
> become the contract all these other tools adhere to. And it is
> still a tough question ... homogeneous trees seem to not be
> "precise" enough (or self-documenting enough") to truly express
> what the input source says, because they don't name relationships.
Structure *is* the relationship. And the grammatical structure
exists whether you think of it that way or not. Further, the
structure *can* be named with a grammar rule, which is target
language independent. For example, 3+4 represented as
is clear to anybody regardless of the implementation language,
right? "+" over two operands is a structure. Usually we keep the
operation at the root so the generalized structure is op over
operands. Don't need no stinkin' OO for that. ;)
> So, what have you homogeneous folks learned through your experience
> that mitigates this itch for explicitly named relationships? Or
> compensates for the lack of named relationships?
add : #(PLUS INT INT) ;
expr : #(PLUS expr expr)
makes this relationship explicit and also gives it a name.
Remember any non random sequence of stuff conforms to a language. I
don't need to convince anybody on this list that grammars are useful
for describing languages I hope. ;)
Hetero trees are fine for one phase translations. I think co-opting
the type system to enforce and describe structure is a nice trick but
it doesn't scale well per my arguments in the article. For example,
the same tree structure can be walked by a tree parser in any target
language (for which you have a target) whereas using a type system
locks you into that language. For example, C just doesn't have the
type system to handle what an OO language can do.
For the next 5 years (after finishing v3), I have committed myself to
educating programmers by writing articles, academic papers, and books
on recognition and translation. part of the problem with tree
grammars has been the lousy ANTLR 2.x implementation and my lack of
'marketing' and examples etc...
In summary, Alexey is clearly a very smart guy and has a nice tool.
SableCC also uses this approach, written by another smart guy. :)
Using the type system is nice for many applications. They are
obvious to the casual observer and work great for many simple
translation, though you have to write the tree recognizer by hand
(<shudder>). For larger problems, I hope over the next n years to
convince you that tree grammars are the way to recognize
structure. :) Give me a chance to finish v3 and then watch the
torrent of doc/articles/papers.
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
More information about the antlr-interest