[antlr-interest] New article on StringTemplates and Treewalkers

Wed Jan 11 13:52:17 PST 2006

Andy Tripp wrote:
> 

>>
>> Isn't this a false dichotomy?  The same considerations apply to both 
>> situations.  If antlr can do many-to-one (source grammar to a variety 
>> of target languages) 
> 
> 
> You mean "one-to-many", here, not "many-to-one", don't you? ANTLR itself 
> has just one input language, and "many" output languages (C++, Java, C#).

Oops.

> 
>> that is only because somebody took the trouble to write the target 
>> generation code.  It's not one-to-many, but many one-to-ones.  This is 
>> exactly what happens with a many-to-one mapping (variety of source 
>> languages to one target language): for each source language somebody 
>> has to take the trouble to write the transformation code, and you 
>> again end up with many one-to-ones.
> 
> 
> No, I don't think that ANTLR is many one-to-ones at all. There is only 
> one input language, there is a lot of code to derive the output, and then
> there are minor variations on the output to make it fit either C++, 
> Java, or C# syntax.

Ok, syntactically, maybe the backend code is mixed up.  But 
conceptually?  After all, what is the difference between many-one and 
many one-one, rilly?

> 
>>
>> So if it is a problem for Antlr, it is the same problem for Jazillion 
>> or any other code xformer, regardless of implementation technique.
> 
> 
> I do agree that (and I'm not sure if this is your point or not) ANTLR 
> and Jazillian seem like they should both be designed the same way.

Not at all, I'm only trying abstract in order to find the gist nut of 
the problem.  After all, if you went to the trouble of trying antlr and 
finding it lacking, there's something there, there.
>>
>> Actually I think "MVC" is probably not the best idiom for discussion 
>> parsing and transformation, coming as it does from the world of 
>> graphical representation of data.  (Personally I don't find it useful 
>> to think of the result of a translation as a "view" of the source; 
>> e.g. calling the parser code generated by Antlr a "view" of the source 
>> grammar doesn't work for me.  
> 
> 
> Me neither, I hope I didn't say that.

Sorry, that actually belongs on a different note to Mr. Parr regarding 
his (excellent) paper on separating MVC.  I like the content, just 
interested in other (possibly "better") ways of expressing it.

> 
>> Nobody considers the machine code emitted by a compiler to be a "view" 
>> of the source code.)
> 
> 
> Ah, but they do. I do, and  that's exactly what Terence is saying in the 
> StringTemplate article...that the target Java, python, and bytecode
> are simple three slightly different "views" of the output. I agree with 
> that.
> 

Well, you're a special case so we get to remove you from the sample.  ;)

But the article was about a straightforward source to source 
transformation - not machine code generation (Java byte code is not 
machine code).  I wonder if you and/or Mr. Parr really think of compiled 
code - machine code - as a "view" of the source.  Ordinarily I mean - of 
course one can talk about it that way for special purposes.

>>
>> The real question is not separation of m v and c, but of the 
>> *genericity* (adaptability, flexibility, whatever) of the "service": 
>> given a parser generator, is its backend architecture general enough 
>> to make it easy to write specialized emitters?  Given a language 
>> transformer (e.g. Jazillion), is its frontend architecture general 
>> enough to make it easy to specialize it for a variety of input languages?
> 
> 
> In my case, I haven't cared too much (yet) that the frontend by able to 
> handle multiple input languages (or that the backend be able
> to output multiple languages for that matter). Just a single C-to-Java 
> translator is hard enough, and I've been happy to spend 3 years full time
> thinking about all the ways to do that really well, rather than 
> expanding my scope. Having said that, I'm now working on C++ to Java, 
> though :)
> 
>>
>> More specifically:  how hard would it be to write an ML or Haskell 
>> emitter for Antlr (something I'd like to see)?
> 
> 
> Good question, and my related question is "will StringTemplate make that 
> any easier?".

For the actual text generation, yes (I think); but that has nothing to 
do with target v. source driven transformation strategies.

>>
>> How hard would it be to write an ML or Haskell front-end for 
>> Jazillion?  (I mean relative to a C frontend, not relative to a 
>> backend to Antlr, which would no doubt be easier.)
> 
> 
> Answer: very hard: the translation rules are all C-specific. To put it 
> bluntly, the Jazillian "front-end" is not in any way separated from the 
> "engine"
> and "backend". I believe it's impossible to design such a 
> any-language-to-any-language translation engine, despite the fact that
> Semantic Designs claims to have such a product.
> 
I guess Lisp or some similar lambda calculus thingee would be best for 
the urlanguage.  Wouldn't that be a fun project?  No doubt somebody 
somewhere has tried.
>>
>> (Note GCC is a good example of genericity both on the front and back 
>> ends.)
> 
> 
> Right, I'm familiar with the gcc 4.0 architecture. IIRC it only supports 
> C/C++ with gcc-specific extensions and Java as input,
> and executable and Java bytecode as output. Good luck on getting it to 
> input or output ML, Haskell, or Lisp :)

I looked into that a bit once.  I don't remember the details, but there 
are languages for which GCC just ain't the right tool.

> 
>>
>> A general observation:  you contrast the Antlr (AST) approach to 
>> "pattern-matching" in a few places (e.g. "is what you've got using 
>> StringTemplates and AST walking better than what you'd have with some 
>> (unspecified here) pattern-matching approach?"
>>
>> But parsing *is* pattern matching, no?  So it isn't clear (to me) what 
>> exact contrast you're trying to establish.
> 
> 
> I'm not refering to ANTLR parsing here, but ANTLR treewalking. But yes, 
> we could consider treewalking to be "pattern-matching on
> two-dimensional trees", while I'm saying I prefer "pattern-matching on 
> one-dimensional token streams". Simply because it's trivial to
> form mental pictures of token streams. When we read "int[] i;", our 
> brain has already tokenized it into a sequence of 5 tokens:
> int [ ] i ;
> But given that same chunk of code, our brains to NOT easily form an AST 
> structure:
> VAR_DEC
>     TYPE "int"
>     ARRAY_DEC  "[]"
>     NAME "i"

Yep.  Although I daresay it depends on which language one is most 
comfortable with.  In lisp dialects it's pretty straightforward to thing 
in terms of something more treelike.  Then again, given the mainstream 
resistance to all those parentheses...

> 
> Avoiding mental pictures of AST trees altogether is just a HUGE 
> productivity boost, at least for me.
> I'd say I'm at least twice as productive in writing rules (both simple 
> text-replacement ones and
> complex ones written in Java code), and probably more like 5-10x more 
> productive
> by largely ignoring AST structures.

That's interesting.  Can't argue with experience.  I suggest we cadge a 
few million bucks out of the DOD to do a study.

-gregg