[antlr-interest] request for comments on language implementation

Thu Feb 26 07:13:18 PST 2009

Andreas Meyer wrote:
(snip)
> Well, written out as a "concrete syntax pattern", this looks somewhat 
> like "pattern(x) = x + 0 ==> x". There exist lots of tools that do this, 
> MatchO maybe can be extended to do it with similar syntax, although some 
> feature may be missing from a fully fledged term rewriting engine (order 
> of rule application, traversal etc).  (I must admit I have not used any 
> of these as of now, so I'm only repeating what I saw elsewhere. I still 
> have to figure out what is feasible to implement).

I had tried out several term-rewrite engines tools (stratego, txl, etc) a few years ago,
and I didn't find any of them to be better than just writing stuff like
this "by hand". IIRC, they all seemed fine at simple stuff like "x+0 -> x", but
once you start putting in real-world requirements ("for all values that might
evaluate to zero", "as long as '+' hasn't been operator-overloaded") they got
pretty messy.

> Once you have such a generic function, indeed it looks interesting. If 
> your application is built on top of a few of those, maybe this would 
> also appeal to me. For our purposes, we need to recognize lots of 
> patterns, that's why I prefer pattern matching :-)

Yes, of course you make the code modular and build up a library of useful
methods. I also have lots of patterns - hundreds - in my C-to-Java translator.
Some are simple enough that a simple pattern-match thing works:
printf(X) -> sprintf(stdout, X)
...but others require a lot of code.

>> I don't see how an ANTLR treewalker lets you make things modular (and 
>> thus maintainable)
>> like this.
> 
> Me too. As said above, for transformations I would prefer patterns and a 
> pattern-matching-engine.

As Terrence mentioned, he's working on something new. I'm also trying to
put together something new.

>>
>>> Also, you make yourself highly dependent on the underlying 
>>> tree-model, which changed a lot from antlr2 to antlr3.
>> Assuming by "tree-model", you mean the shape of the AST you're 
>> creating, no,
>> that wouldn't (or shouldn't) change with the version of ANTLR that you're
>> using. You build the AST to be the shape that you want, regardless of 
>> ANTLR version.
> 
> Have you used antlr3? In our application, the migration to antlr3 is a 
> major effort. For example, an antlr2 AST node can be a singular node,or 
> a node with a sibling, which has another sibling. So, it can be either a 
> node or a list of nodes. If your code depends on this, you end up 
> replacing many type definitions with List<AST>. Or you add the nodes you 
> actually want to return to a dummy parent, return the first child of 
> that parent, and let the caller find out if the returned node has a 
> parent and if so, regard all its childs as the rest of the list. If you 
> did not write your program in such a fashion, maybe the translation to 
> antlr3 gets easier then. 
> But still, there is a function called 
> "setNextSibling" in antlr2. How do you emulate that in antlr3? You have 
> to distinguish lots of special cases to find out if it means: addChild, 
> deleteChild, deleteRangeOfChildren .. whatever.

Yes, I've used antlr3, but I haven't actually converted an existing
ANTLR2-based app to ANTLR3. I tended to subclass most of the AST functionality
anyway (for example, hiding the nary-tree implemented as a binary tree
getFirstChild/getNextSibling thing). I see now how upgrading to
antlr3 could be a pain. I guess that's why I'm just sticking with antlr2
for now.

>> I think "hand-written" code to walk an AST is less
>> dependent on AST structure than an ANTLR treewalker is.
>> For example, suppose I want to find all the ancestor nodes of type
>> "INT". The "hand-written" code is something like:
>>
>> List<AST> nodes = someNode.findAncestorsOfType(MyLanguageTokenTypes.INT);
>>
>> ...while the ANTLR treewalker that does the same thing consists of the
>> *entire* tree shape, with extra code added. The ANTLR treewalker forces
>> you to specify much more info than you actually care about.
> 
> You could have only a basic tree walker and inherit from that, 
> overriding only the places you want. So, if you have a lot more easy 
> tree walkers, the single tree-walker copy of the antlr grammar is still 
> a burden, but it only exists once.

But each time you change the structure of the AST, whatever treewalker grammar
you have is no longer valid. Michael has a handful (5-10) of AST-changing
rules, so ends up with a handful of grammars. I have many hundreds of
AST-changing rules. It's completely impractical to keep track of what a
valid AST looks like at each step.

>> If by "tree-model" you mean the ANTLR syntax and semantics, then 
>> obviously
>> the "hand-written" treewalker is not dependent on it at all, while the
>> ANLTR treewalker is completely dependent on it.
> 
> Well, there might be lots of syntactic changes, but I consider these a 
> lot easier to deal with than a structural change of the internal tree 
> representation, as was the case between antlr2->antlr3. And if you dont 
> like treewalker's at all, then maybe generic visitors/combinators could 
> be of interest to you (see JJtraveller).
Do you have a link for JJtraveller? Google can't find it.

> Maybe it's possible to also 
> write such a tool for antlr, so that, in effect, you get a 
> base-tree-walker (without any actions) for free.
> 
> So, although my mail contains lots of speculation, I hope it provides 
> some ideas that seem useful to you :-)

Yes, thanks for the discussion!
Andy

> Andreas
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>