[antlr-interest] CommonTree & Tree grammar versus DIY

Fri Aug 15 10:13:37 PDT 2008

Terence Parr wrote:
> 
> On Aug 15, 2008, at 8:12 AM, Andy Tripp wrote:
> 
>> Marc,
>> This "writing by hand" article is referring to walking a tree-like 
>> data structure (the AST).
>> Pretty much every programmer already knows how to do this by hand:
>>
>> void walk(Tree t) {
>> doSomething(t);
>> for (Tree child: t.children()) {
>>    walk(child);
>> }
>> }
> 
> But empirically, people don't know how to do language stuff so that must 
> not be enough as you clearly know from your very complicated and nice 
> stuff :)

Right. Whoever writes the doSomething() method shown above is going to
have to know what the AST looks like, regardless of whether the doSomething()
call is embedded in a treewalker.g file or plain old code.

> 
>> By contrast, there are only a handful of people who know how to do 
>> that same thing
>> with an ANTLR treewalker.
> 
> Coincidental with the people that have built language stuff ;)

There are an awful lot of parser/lexer tools out there besides ANTLR.
I would guess that of all the people doing AST manipulation, few of them
use ANTLR treewalkers to do it.

> 
>> The "manipulating an AST" work of building a translator is quite a 
>> different
>> job than the "build a lexer/parser" part of it.
> 
> Yep.  i'm mostly a fan of walking not rewriting ASts by the way...that 
> gets WAY too hard.

Maybe I'm misunderstanding you, but walking vs. rewriting is like
apples vs. oranges (or "the drive to and from work" vs. "going to
work all day"). Walking is what you do when there's (almost) nothing
to do.

Further, Andy has a nice approach of rewriting.  
> Expect ANTLRMorph, by Leon Su, this Fall!

To pick an AST manipulation rule at random...consider that when translating
C to Java, you can assign a zero value to any C struct. The C struct will become
a Java class, and the zero needs to change to "null". Where in the AST
do we need to check? Here are a few ways to "assign 0" in C:

* simple assignment:   a = NULL;    (where we have #define NULL 0 somewhere)
* function call arg:   f(0);        (where f's first arg is some struct type)
* ternary operator:    a = b ? 0 : 0; 
* function return:     a = f();     (where f() returns struct type)

So if we have a checkForZeroAssignment() method, calls to it will have to be inserted
in at least four distinct places in our Ctree.g file. And checkForZero() will
have to "know" where it's being called from anyway (for example, in the
simple assignment case, it may have to search deep into nested curly braces
for array assignment).

So which way is best to slice the problem? We can have a
ZeroAssignmentRule class (along with a couple hundred others to do other things)
that searches the AST for these four cases. Or
we can have a Ctree.g that, at the ASSIGN node calls checkForZeroAssignment()
and a bunch (20? 50?) of other checks to do other things.

I'll be pretty impressed if ANTLRMorph makes this kind of work any easier,
as none of the other AST-rewrite systems seem to.

...all just my opinion, of course ;)
Andy
> 
> Ter
>