[antlr-interest] request for comments on language implementation

Thu Feb 26 02:07:36 PST 2009

Andy Tripp schrieb:
> Hi Andreas,
>
> Andreas Meyer wrote:
>> Hi!
>>
>> Your mail was addressed to Michael, but I hope it's ok to answer 
>> nonetheless:
>>
>> I would consider hand-written code to walk an AST harmful. Maybe 
>> there are cases where it is useful, but walking a dynamically typed 
>> tree like this looks like a maintenance nightmare to me. 
>
> I've found it much easier to do "hand-written" AST transformations.
> For example, to find cases like "x+0" and replace with just "x", you'd
> have something like:
>
> List<AST> pluses = root.findAncestorsOfType(MyTokenTypes.PLUS);
> for (AST ast: plusses) {
>        if (ast.getChildCount() == 2         && 
> ast.getSecondChild().getType() == MyTokenTypes.INTEGER
>            && ast.getSecondChild().getText().equals("0")) {
>         ast.replaceMyself(ast.getFirstChild());  // replace "x+0" with 
> just "x"
>     }
> }

Well, written out as a "concrete syntax pattern", this looks somewhat 
like "pattern(x) = x + 0 ==> x". There exist lots of tools that do this, 
MatchO maybe can be extended to do it with similar syntax, although some 
feature may be missing from a fully fledged term rewriting engine (order 
of rule application, traversal etc).  (I must admit I have not used any 
of these as of now, so I'm only repeating what I saw elsewhere. I still 
have to figure out what is feasible to implement).

>
> It's maintainable because later, when you want to also replace "x*1" 
> with just "x", and a few
> other similar patterns, you can make this into a generic function:
>
> void replaceIdentity(MyTokenType type, String value, AST root) {
>     List<AST> asts = root.findAncestorsOfType(type);
>     for (AST ast: plusses) {
>            if (ast.getChildCount() == 2             && 
> ast.getSecondChild().getType() == MyTokenTypes.INTEGER
>                && ast.getSecondChild().getText().equals("0")) {
>             ast.replaceMyself(ast.getFirstChild());  // replace "x OP 
> VALUE" with just "x"
>         }
>     }
> }
>
> And now, you just have:
> replaceIdentity(MyTokenType.PLUS, "0", root); // replace "x+0" with "x"
> replaceIdentity(MyTokenType.MINUS, "0", root);// replace "x-0" with "x"
> replaceIdentity(MyTokenType.MULTIPLY, "1", root);// replace "x*1" with 
> "x"
> replaceIdentity(MyTokenType.DIVIDE, "1", root);// replace "x/1" with "x"

Once you have such a generic function, indeed it looks interesting. If 
your application is built on top of a few of those, maybe this would 
also appeal to me. For our purposes, we need to recognize lots of 
patterns, that's why I prefer pattern matching :-)
>
> I don't see how an ANTLR treewalker lets you make things modular (and 
> thus maintainable)
> like this.

Me too. As said above, for transformations I would prefer patterns and a 
pattern-matching-engine.
>
>
>> Also, you make yourself highly dependent on the underlying 
>> tree-model, which changed a lot from antlr2 to antlr3.
>
> Assuming by "tree-model", you mean the shape of the AST you're 
> creating, no,
> that wouldn't (or shouldn't) change with the version of ANTLR that you're
> using. You build the AST to be the shape that you want, regardless of 
> ANTLR version.

Have you used antlr3? In our application, the migration to antlr3 is a 
major effort. For example, an antlr2 AST node can be a singular node,or 
a node with a sibling, which has another sibling. So, it can be either a 
node or a list of nodes. If your code depends on this, you end up 
replacing many type definitions with List<AST>. Or you add the nodes you 
actually want to return to a dummy parent, return the first child of 
that parent, and let the caller find out if the returned node has a 
parent and if so, regard all its childs as the rest of the list. If you 
did not write your program in such a fashion, maybe the translation to 
antlr3 gets easier then. But still, there is a function called 
"setNextSibling" in antlr2. How do you emulate that in antlr3? You have 
to distinguish lots of special cases to find out if it means: addChild, 
deleteChild, deleteRangeOfChildren .. whatever.
>
> I think "hand-written" code to walk an AST is less
> dependent on AST structure than an ANTLR treewalker is.
> For example, suppose I want to find all the ancestor nodes of type
> "INT". The "hand-written" code is something like:
>
> List<AST> nodes = someNode.findAncestorsOfType(MyLanguageTokenTypes.INT);
>
> ...while the ANTLR treewalker that does the same thing consists of the
> *entire* tree shape, with extra code added. The ANTLR treewalker forces
> you to specify much more info than you actually care about.

You could have only a basic tree walker and inherit from that, 
overriding only the places you want. So, if you have a lot more easy 
tree walkers, the single tree-walker copy of the antlr grammar is still 
a burden, but it only exists once.
> If by "tree-model" you mean the ANTLR syntax and semantics, then 
> obviously
> the "hand-written" treewalker is not dependent on it at all, while the
> ANLTR treewalker is completely dependent on it.

Well, there might be lots of syntactic changes, but I consider these a 
lot easier to deal with than a structural change of the internal tree 
representation, as was the case between antlr2->antlr3. And if you dont 
like treewalker's at all, then maybe generic visitors/combinators could 
be of interest to you (see JJtraveller). Maybe it's possible to also 
write such a tool for antlr, so that, in effect, you get a 
base-tree-walker (without any actions) for free.

So, although my mail contains lots of speculation, I hope it provides 
some ideas that seem useful to you :-)
Andreas