[antlr-interest] New article on StringTemplates and Treewalkers

Wed Jan 11 09:34:18 PST 2006

Andy Tripp wrote:
> In a fit of reverse-writer's-block last night, I wrote down
> some thoughts on AST treewalking and StringTemplate, titled
> "Why I don't Use StringTemplate for Language translation"
> 
> The article is here: http://www.jazillian.com/stringTemplate.html
> 

Hi Andy,

A few holes to poke in your article.  Which I mean in the nicest 
possible way!

 From your paper:  "But the main rationale for separating the "view" 
from the "controller" and "model" is so that we can have multiple 
"views", and that we can easily change the "view" without having to 
touch the "model" or the "controller. Certain applications may have 
multiple "views" (ANTLR, for example, which takes a single input in 
ANTLR-language, but generates Java code for Java programmers, C code for 
C programmers, etc). But for other applications, such as a 
"Any-dialect-of-C to Java" or "C or C++ to Java", the mapping is 
many-to-one, not one-to-many."

Isn't this a false dichotomy?  The same considerations apply to both 
situations.  If antlr can do many-to-one (source grammar to a variety of 
target languages) that is only because somebody took the trouble to 
write the target generation code.  It's not one-to-many, but many 
one-to-ones.  This is exactly what happens with a many-to-one mapping 
(variety of source languages to one target language): for each source 
language somebody has to take the trouble to write the transformation 
code, and you again end up with many one-to-ones.

So if it is a problem for Antlr, it is the same problem for Jazillion or 
any other code xformer, regardless of implementation technique.

Actually I think "MVC" is probably not the best idiom for discussion 
parsing and transformation, coming as it does from the world of 
graphical representation of data.  (Personally I don't find it useful to 
think of the result of a translation as a "view" of the source; e.g. 
calling the parser code generated by Antlr a "view" of the source 
grammar doesn't work for me.  Nobody considers the machine code emitted 
by a compiler to be a "view" of the source code.)

The real question is not separation of m v and c, but of the 
*genericity* (adaptability, flexibility, whatever) of the "service": 
given a parser generator, is its backend architecture general enough to 
make it easy to write specialized emitters?  Given a language 
transformer (e.g. Jazillion), is its frontend architecture general 
enough to make it easy to specialize it for a variety of input languages?

More specifically:  how hard would it be to write an ML or Haskell 
emitter for Antlr (something I'd like to see)?

How hard would it be to write an ML or Haskell front-end for Jazillion? 
  (I mean relative to a C frontend, not relative to a backend to Antlr, 
which would no doubt be easier.)

(Note GCC is a good example of genericity both on the front and back ends.)

A general observation:  you contrast the Antlr (AST) approach to 
"pattern-matching" in a few places (e.g. "is what you've got using 
StringTemplates and AST walking better than what you'd have with some 
(unspecified here) pattern-matching approach?"

But parsing *is* pattern matching, no?  So it isn't clear (to me) what 
exact contrast you're trying to establish.

One of the examples you give to illustrate the difficulty of AST-walking:

	2.  At any "printf function" node, loop through the format string and 
arguments, and do lots of processing to replace them with Java using the 
"+" operator.

My understanding is that you would just write a production for the 
grammar of the args of the printf function, which you could take 
directly from the C grammar, augmented by info from the printf 
definition in the library.  The "lots of processing" must occur 
regardless of implementation strategy, but in Antlr the grammar 
recognition part (looping through the format string and args) is clear 
and simple(?).

Correct me if I'm wrong, but I get the impression you're thinking about 
writing by hand a bunch of the AST parsing logic that Antlr generates 
automatically for tree grammars, rather the way you might need to 
proceed if you were using a less sophisticated parser generator 
(lex/yacc, etc.)  In that case, yes, it would definitely be a pain 
because you might need to do it all by hand.  But if I understand Antlr 
correctly, it saves you the trouble by supporting tree grammar.  So the 
interesting contrast is not necessarily between your approach and 
Antlr's, but between Antlr v. other parser generators.

All for now.  I'm not sure I agree with your paper, but it has certainly 
provoked thought.

-gregg