[antlr-interest] Fwd: philosophy about translation

Monty Zukowski monty at codetransform.com
Fri Oct 6 10:36:10 PDT 2006


I forgot to properly copy antlr-interest on this, so for posterity
here is a message missing from this thread....


> In COBOL we have statements like:
> ADD A TO B    // B += A;
> ADD A B TO C D   // C+= A + B;  D+= A + B;
> ADD A TO B GIVING C    // C = A + B;
>
> If you bifurcate at the statement level, then you have lots of logic that
> says "Here is the COBOL ADD statement, and now I'll generate the
> equivalent Java
> statement, and either replace the COBOL AST with the Java one, or just
> somehow just attach the Java AST to the COBOL AST."
>
> That's fine, but it just means that (almost) all your logic is there, in
> that processing.
> The fact that it's stored in an AST at all is of little help to
> you...you're not doing
> many AST manipulations. So the AST just becomes a convenient data structure
> for storing the state between phases, as opposed to a convenient data
> structure
> for actually performing language translation on.

I guess I don't understand your distinction here because I don't know
what your alternative is.  I found it very handy to do something like
that ADD transformation into the target language because I could still
ignore things like sub-expressions which were still COBOL.  Step by
step I changed each particular source expression to a target
expression.

> And I don't think the AST is helping you at all (at least for COBOL to Java)
> with that design, because COBOL and Java are at least a little similar at
> and below the statement level (as the example above shows, I can typically
> map a single COBOL statement to single Java statement). But
> above that level, the COBOL AST looks almost nothing like the Java one.
> Compare this COBOL grammar to a Java one:
> http://www.cs.vu.nl/grammars/vs-cobol-ii

Oh, right.  You just don't like ASTs.  However, it is still possible
to represent two completely different languages in one tree, and have
intermediate phases with a mixture of the two different trees walked
by the same grammar.

> >
> > That both types of statements could co-exist in the same tree, and
> > even have different types of sub-statements.  Similarly for
> > expressions--an expression could use either language's operators, and
> > I could have passes that just dealt with arithmetic or string handling
> > or whatever, so that in one pass expressions are all arev the next
> > would have vb arithmetic and arev everything else, etc.
>
> I did the same for C/C++ and Java: expressions are virtually identical
> in the
> two languages. But check out expressions in COBOL:
> http://www.cs.vu.nl/grammars/vs-cobol-ii/#gdef:arithmetic-expression
>
> >
> > I was raving about this like 7 years ago, it just totally rocks!
> > Check the archives for my posts about multiple tree grammars, or ask
> > questions if something isn't clear.
> >
> > By the last pass, I had a completely vb tree, and then I finally
> > dumped it to text.
>
> I had looked very carefully at all your stuff when I started 4 years ago.
> My feeling is that if you're going to do a "natural" translation - that is:
>
> String hello = "hello";
> String world = "world";
> printf("%s %s\n", hello, world);
>
> ...becomes...
>
> System.out.println("Hello World");
>
> then the "walking the AST" approach doesn't come close to working.
> The two ASTs for those two code chunks
> have almost nothing in common, and doing that translation
> is 1% a "tree-manipulation" problem, and 99% a "code mapping" problem.
>

Well, that problem becomes a "static analysis" problem and a "constant
expression substitution that is aware of printf args" problem as well.


> I think if tree-walking works for most of the translation work, you either
> have two very similar languages, or your output code looks just like your
> input code with different syntax. "I don't want 'JOBOL'", as one of my
> customers said :)

AREV & VB actually had quite different syntax.  If you have a decent
tree structure, the difference of syntax of the languages is
irrelevant.  AREV had some wacky expressions, but once the program was
parsed the trees for statements and expressions were easy to
understand and manipulate.

I'm not debating you on whether your way is better or not.  I just
disagree with your statements about where tree walking doesn't work.

Monty


More information about the antlr-interest mailing list