[antlr-interest] request for comments on language implementation
Andreas Meyer
andreas.meyer at smartshift.de
Thu Feb 26 02:07:36 PST 2009
Andy Tripp schrieb:
> Hi Andreas,
>
> Andreas Meyer wrote:
>> Hi!
>>
>> Your mail was addressed to Michael, but I hope it's ok to answer
>> nonetheless:
>>
>> I would consider hand-written code to walk an AST harmful. Maybe
>> there are cases where it is useful, but walking a dynamically typed
>> tree like this looks like a maintenance nightmare to me.
>
> I've found it much easier to do "hand-written" AST transformations.
> For example, to find cases like "x+0" and replace with just "x", you'd
> have something like:
>
> List<AST> pluses = root.findAncestorsOfType(MyTokenTypes.PLUS);
> for (AST ast: plusses) {
> if (ast.getChildCount() == 2 &&
> ast.getSecondChild().getType() == MyTokenTypes.INTEGER
> && ast.getSecondChild().getText().equals("0")) {
> ast.replaceMyself(ast.getFirstChild()); // replace "x+0" with
> just "x"
> }
> }
Well, written out as a "concrete syntax pattern", this looks somewhat
like "pattern(x) = x + 0 ==> x". There exist lots of tools that do this,
MatchO maybe can be extended to do it with similar syntax, although some
feature may be missing from a fully fledged term rewriting engine (order
of rule application, traversal etc). (I must admit I have not used any
of these as of now, so I'm only repeating what I saw elsewhere. I still
have to figure out what is feasible to implement).
>
> It's maintainable because later, when you want to also replace "x*1"
> with just "x", and a few
> other similar patterns, you can make this into a generic function:
>
> void replaceIdentity(MyTokenType type, String value, AST root) {
> List<AST> asts = root.findAncestorsOfType(type);
> for (AST ast: plusses) {
> if (ast.getChildCount() == 2 &&
> ast.getSecondChild().getType() == MyTokenTypes.INTEGER
> && ast.getSecondChild().getText().equals("0")) {
> ast.replaceMyself(ast.getFirstChild()); // replace "x OP
> VALUE" with just "x"
> }
> }
> }
>
> And now, you just have:
> replaceIdentity(MyTokenType.PLUS, "0", root); // replace "x+0" with "x"
> replaceIdentity(MyTokenType.MINUS, "0", root);// replace "x-0" with "x"
> replaceIdentity(MyTokenType.MULTIPLY, "1", root);// replace "x*1" with
> "x"
> replaceIdentity(MyTokenType.DIVIDE, "1", root);// replace "x/1" with "x"
Once you have such a generic function, indeed it looks interesting. If
your application is built on top of a few of those, maybe this would
also appeal to me. For our purposes, we need to recognize lots of
patterns, that's why I prefer pattern matching :-)
>
> I don't see how an ANTLR treewalker lets you make things modular (and
> thus maintainable)
> like this.
Me too. As said above, for transformations I would prefer patterns and a
pattern-matching-engine.
>
>
>> Also, you make yourself highly dependent on the underlying
>> tree-model, which changed a lot from antlr2 to antlr3.
>
> Assuming by "tree-model", you mean the shape of the AST you're
> creating, no,
> that wouldn't (or shouldn't) change with the version of ANTLR that you're
> using. You build the AST to be the shape that you want, regardless of
> ANTLR version.
Have you used antlr3? In our application, the migration to antlr3 is a
major effort. For example, an antlr2 AST node can be a singular node,or
a node with a sibling, which has another sibling. So, it can be either a
node or a list of nodes. If your code depends on this, you end up
replacing many type definitions with List<AST>. Or you add the nodes you
actually want to return to a dummy parent, return the first child of
that parent, and let the caller find out if the returned node has a
parent and if so, regard all its childs as the rest of the list. If you
did not write your program in such a fashion, maybe the translation to
antlr3 gets easier then. But still, there is a function called
"setNextSibling" in antlr2. How do you emulate that in antlr3? You have
to distinguish lots of special cases to find out if it means: addChild,
deleteChild, deleteRangeOfChildren .. whatever.
>
> I think "hand-written" code to walk an AST is less
> dependent on AST structure than an ANTLR treewalker is.
> For example, suppose I want to find all the ancestor nodes of type
> "INT". The "hand-written" code is something like:
>
> List<AST> nodes = someNode.findAncestorsOfType(MyLanguageTokenTypes.INT);
>
> ...while the ANTLR treewalker that does the same thing consists of the
> *entire* tree shape, with extra code added. The ANTLR treewalker forces
> you to specify much more info than you actually care about.
You could have only a basic tree walker and inherit from that,
overriding only the places you want. So, if you have a lot more easy
tree walkers, the single tree-walker copy of the antlr grammar is still
a burden, but it only exists once.
> If by "tree-model" you mean the ANTLR syntax and semantics, then
> obviously
> the "hand-written" treewalker is not dependent on it at all, while the
> ANLTR treewalker is completely dependent on it.
Well, there might be lots of syntactic changes, but I consider these a
lot easier to deal with than a structural change of the internal tree
representation, as was the case between antlr2->antlr3. And if you dont
like treewalker's at all, then maybe generic visitors/combinators could
be of interest to you (see JJtraveller). Maybe it's possible to also
write such a tool for antlr, so that, in effect, you get a
base-tree-walker (without any actions) for free.
So, although my mail contains lots of speculation, I hope it provides
some ideas that seem useful to you :-)
Andreas
More information about the antlr-interest
mailing list