[antlr-interest] request for comments on language implementation

Wed Feb 25 14:02:52 PST 2009

Hi Andreas,

Andreas Meyer wrote:
> Hi!
> 
> Your mail was addressed to Michael, but I hope it's ok to answer 
> nonetheless:
> 
> I would consider hand-written code to walk an AST harmful. Maybe there 
> are cases where it is useful, but walking a dynamically typed tree like 
> this looks like a maintenance nightmare to me. 

I've found it much easier to do "hand-written" AST transformations.
For example, to find cases like "x+0" and replace with just "x", you'd
have something like:

List<AST> pluses = root.findAncestorsOfType(MyTokenTypes.PLUS);
for (AST ast: plusses) {
        if (ast.getChildCount() == 2 
	    && ast.getSecondChild().getType() == MyTokenTypes.INTEGER
            && ast.getSecondChild().getText().equals("0")) {
		ast.replaceMyself(ast.getFirstChild());  // replace "x+0" with just "x"
	}
}

Isn't this simpler than the ANTLR treewalker equivalent?
Especially since whoever takes over this code when you leave will likely
already know Java but not know ANTLR?

It's maintainable because later, when you want to also replace "x*1" with just "x", and a few
other similar patterns, you can make this into a generic function:

void replaceIdentity(MyTokenType type, String value, AST root) {
	List<AST> asts = root.findAncestorsOfType(type);
	for (AST ast: plusses) {
        	if (ast.getChildCount() == 2 
		    && ast.getSecondChild().getType() == MyTokenTypes.INTEGER
        	    && ast.getSecondChild().getText().equals("0")) {
			ast.replaceMyself(ast.getFirstChild());  // replace "x OP VALUE" with just "x"
		}
	}
}

And now, you just have:
replaceIdentity(MyTokenType.PLUS, "0", root); // replace "x+0" with "x"
replaceIdentity(MyTokenType.MINUS, "0", root);// replace "x-0" with "x"
replaceIdentity(MyTokenType.MULTIPLY, "1", root);// replace "x*1" with "x"
replaceIdentity(MyTokenType.DIVIDE, "1", root);// replace "x/1" with "x"

I don't see how an ANTLR treewalker lets you make things modular (and thus maintainable)
like this.

> Also, you make yourself 
> highly dependent on the underlying tree-model, which changed a lot from 
> antlr2 to antlr3.

Assuming by "tree-model", you mean the shape of the AST you're creating, no,
that wouldn't (or shouldn't) change with the version of ANTLR that you're
using. You build the AST to be the shape that you want, regardless of ANTLR version.

I think "hand-written" code to walk an AST is less
dependent on AST structure than an ANTLR treewalker is.
For example, suppose I want to find all the ancestor nodes of type
"INT". The "hand-written" code is something like:

List<AST> nodes = someNode.findAncestorsOfType(MyLanguageTokenTypes.INT);

...while the ANTLR treewalker that does the same thing consists of the
*entire* tree shape, with extra code added. The ANTLR treewalker forces
you to specify much more info than you actually care about.

If by "tree-model" you mean the ANTLR syntax and semantics, then obviously
the "hand-written" treewalker is not dependent on it at all, while the
ANLTR treewalker is completely dependent on it. So a working "hand-written"
treewalker requires no changes at all when going from ANTLR v2 to v3, while
an ANTLR treewalker will obviously have to be rewritten.

> 
> I share some of your thoughts on tree-grammars: they are not solving 
> every problem. But still, they are useful for computing easy properties 
> like "contains comparison" or "contains assignment to variable of type 
> xxx". 

I'm convinced that "hand-written" treewalkers are easier for even these
trivial cases. "contains comparison" is simply:

boolean contains(MyTokenType type) {
	if (this.type == type) return true;
	for (AST child: getChildren()) {
		if (child.contains(type)) return true;
	}
	return false;
}

I wrote that off the top of my head as fast as I could type it, and I
think most Java programmers could. Can you say the same for writing
an ANTLR treewalker?

As for "contains assignment to variable of type xxx", we'll need a symbol
table and need to know about variable scoping and syntax
(e.g. what type do we have here:  a.b.f()[20].c = 1;)
So that's one of the complicated cases that I talked about.

> I guess what you are looking for is a tool to look for certain 
> patterns in a source language, and how to map this pattern to another 
> pattern of the same language? Possibly in concrete syntax, with 
> placeholders? Yesterday I found a library called MatchO 
> (http://wiki.di.uminho.pt/twiki/bin/view/Personal/Joost/MatchO) that 
> seems to do just that, but I still have to experiment with it. With some 
> modifications, it might even work for antlr3.

I'm always on the lookout for such tools, but here I'm just trying to
understand how a treewalker could be better than "hand-written" even for
the simpler cases. Michael's case seems like a good "middle of the road"
case: not trivial, but not a nightmare. He's just gone through building
a few treewalkers, and probably has a good feel for how difficult it was.

I especially wonder...suppose someone else takes over" his code and figures it out.
I suspect he enjoyed learning something new (ANTLR treewalkers), so it may not
have seemed difficult, in fact it was probably fun.  But 
the person who "takes over" will just want to minimize the maintenance effort.
I would guess that this "next person" would prefer vanilla tree-data-structure
stuff like "findAncestorsOfType()" rather than having to learn ANTLR.
But that's just a guess, I'd love to hear from someone who's actually
been through it.

Andy
> 
> Best,
> Andreas Meyer