[antlr-interest] philosophy about translation

Fri Oct 6 06:59:16 PDT 2006

Monty Zukowski wrote:

>> In COBOL we have statements like:
>> ADD A TO B    // B += A;
>> ADD A B TO C D   // C+= A + B;  D+= A + B;
>> ADD A TO B GIVING C    // C = A + B;
>>
>> If you bifurcate at the statement level, then you have lots of logic 
>> that
>> says "Here is the COBOL ADD statement, and now I'll generate the
>> equivalent Java
>> statement, and either replace the COBOL AST with the Java one, or just
>> somehow just attach the Java AST to the COBOL AST."
>>
>> That's fine, but it just means that (almost) all your logic is there, in
>> that processing.
>> The fact that it's stored in an AST at all is of little help to
>> you...you're not doing
>> many AST manipulations. So the AST just becomes a convenient data 
>> structure
>> for storing the state between phases, as opposed to a convenient data
>> structure
>> for actually performing language translation on.
>
>
> I guess I don't understand your distinction here because I don't know
> what your alternative is.  I found it very handy to do something like
> that ADD transformation into the target language because I could still
> ignore things like sub-expressions which were still COBOL.  Step by
> step I changed each particular source expression to a target
> expression.

Here's my alternative:
I have hundreds of "transformation rules" where each rule can either be 
a simple mapping like this:
ADD v1 TO v2. --> v2 += v1;
ADD v1 v2 TO v3 v4. --> v3 += v1 + v2; v4 += v1 + v2
ADD v1 TO v2 GIVING v3.  --> v3 = v1 + v2;

...or actual Java code that operates on sequences of tokens, not trees.
I find it easier to think of the source and target languages as 
sequences of tokens
("ADD" followed by a variable followed by "TO"...) rather than trees (an 
"ADD" node
with the first child being a variable and the second child being "TO"...)

>
>> And I don't think the AST is helping you at all (at least for COBOL 
>> to Java)
>> with that design, because COBOL and Java are at least a little 
>> similar at
>> and below the statement level (as the example above shows, I can 
>> typically
>> map a single COBOL statement to single Java statement). But
>> above that level, the COBOL AST looks almost nothing like the Java one.
>> Compare this COBOL grammar to a Java one:
>> http://www.cs.vu.nl/grammars/vs-cobol-ii
>
>
> Oh, right.  You just don't like ASTs.  However, it is still possible
> to represent two completely different languages in one tree, and have
> intermediate phases with a mixture of the two different trees walked
> by the same grammar.

Well, yes, it's possible to represent English and Java in one tree, but
the real question is "is that the best data structure for the task?"

>
>> >
>> > That both types of statements could co-exist in the same tree, and
>> > even have different types of sub-statements.  Similarly for
>> > expressions--an expression could use either language's operators, and
>> > I could have passes that just dealt with arithmetic or string handling
>> > or whatever, so that in one pass expressions are all arev the next
>> > would have vb arithmetic and arev everything else, etc.
>>
>> I did the same for C/C++ and Java: expressions are virtually identical
>> in the
>> two languages. But check out expressions in COBOL:
>> http://www.cs.vu.nl/grammars/vs-cobol-ii/#gdef:arithmetic-expression
>>
>> >
>> > I was raving about this like 7 years ago, it just totally rocks!
>> > Check the archives for my posts about multiple tree grammars, or ask
>> > questions if something isn't clear.
>> >
>> > By the last pass, I had a completely vb tree, and then I finally
>> > dumped it to text.
>>
>> I had looked very carefully at all your stuff when I started 4 years 
>> ago.
>> My feeling is that if you're going to do a "natural" translation - 
>> that is:
>>
>> String hello = "hello";
>> String world = "world";
>> printf("%s %s\n", hello, world);
>>
>> ...becomes...
>>
>> System.out.println("Hello World");
>>
>> then the "walking the AST" approach doesn't come close to working.
>> The two ASTs for those two code chunks
>> have almost nothing in common, and doing that translation
>> is 1% a "tree-manipulation" problem, and 99% a "code mapping" problem.
>>
>
> Well, that problem becomes a "static analysis" problem and a "constant
> expression substitution that is aware of printf args" problem as well.

Right. So I guess my view is just that language translation, at least 
when producing
realistic output, is 99% "static analysis" and 1% "tree transformation".

>
>
>> I think if tree-walking works for most of the translation work, you 
>> either
>> have two very similar languages, or your output code looks just like 
>> your
>> input code with different syntax. "I don't want 'JOBOL'", as one of my
>> customers said :)
>
>
> AREV & VB actually had quite different syntax.  If you have a decent
> tree structure, the difference of syntax of the languages is
> irrelevant.  AREV had some wacky expressions, but once the program was
> parsed the trees for statements and expressions were easy to
> understand and manipulate.

Well, I preferred sequences of tokens over ASTs even for C to Java and 
C++ to Java,
so my views are pretty extreme. On the other hand, of all the language 
translators out there,
I don't see any that (to my eye) produce realistic, "natural" code, 
other than my Jazillian.
It's one thing to produce working Java (source or byte code) code from 
C. It's quite another
to produce code that looks hand-written. The logic to combine those 3 
"printf" statements
into a single, realistic Java one is far harder than just changing to 
Java syntax. Somewhere
in writing that code I came to realize "Hey, the AST is just getting in 
the way here".

>
> I'm not debating you on whether your way is better or not.  I just
> disagree with your statements about where tree walking doesn't work.

That's fair. I don't mean to say tree walking won't work here. Even the 
poster who had
just 8 phases is probably better off with a tree. If he gets to the 
point where he's got 20
phases, and only 5 of the phases are helped by the fact that ASTs are 
being used, then
he's entering my world :)

I guess what I'm doing is more akin to natural language translation. I'm 
sure an English to
Spanish translator has relatively little logic that does tree 
transformation (e.g. putting the
adjectives *after* the nouns).

>
> Monty
>