[antlr-interest] philosophy about translation

Andy Tripp antlr at jazillian.com
Thu Oct 5 13:29:24 PDT 2006


>
> In fact it is quite easy to have multiple languages in the same tree.
> I did this for the AREV->VB translator.  The key is to have AREV_PLUS
> and VB_PLUS for the '+' tree nodes.  Then you don't have to guess if
> you are processing arev addition or vb addition.
>
> I had one super-treegrammar with two complete tree grammars therin.  I
> found it convenient to bifurcate at the statement level.  For example:
>
> program: (statement)*;
> statement:arevStatement | vbStatement;
> ... 


In COBOL we have statements like:
ADD A TO B    // B += A;
ADD A B TO C D   // C+= A + B;  D+= A + B;
ADD A TO B GIVING C    // C = A + B;

If you bifurcate at the statement level, then you have lots of logic that
says "Here is the COBOL ADD statement, and now I'll generate the 
equivalent Java
statement, and either replace the COBOL AST with the Java one, or just
somehow just attach the Java AST to the COBOL AST."

That's fine, but it just means that (almost) all your logic is there, in 
that processing.
The fact that it's stored in an AST at all is of little help to 
you...you're not doing
many AST manipulations. So the AST just becomes a convenient data structure
for storing the state between phases, as opposed to a convenient data 
structure
for actually performing language translation on.

And I don't think the AST is helping you at all (at least for COBOL to Java)
with that design, because COBOL and Java are at least a little similar at
and below the statement level (as the example above shows, I can typically
map a single COBOL statement to single Java statement). But
above that level, the COBOL AST looks almost nothing like the Java one.
Compare this COBOL grammar to a Java one:
http://www.cs.vu.nl/grammars/vs-cobol-ii

>
> That both types of statements could co-exist in the same tree, and
> even have different types of sub-statements.  Similarly for
> expressions--an expression could use either language's operators, and
> I could have passes that just dealt with arithmetic or string handling
> or whatever, so that in one pass expressions are all arev the next
> would have vb arithmetic and arev everything else, etc.

I did the same for C/C++ and Java: expressions are virtually identical 
in the
two languages. But check out expressions in COBOL:
http://www.cs.vu.nl/grammars/vs-cobol-ii/#gdef:arithmetic-expression

>
> I was raving about this like 7 years ago, it just totally rocks!
> Check the archives for my posts about multiple tree grammars, or ask
> questions if something isn't clear.
>
> By the last pass, I had a completely vb tree, and then I finally
> dumped it to text.

I had looked very carefully at all your stuff when I started 4 years ago.
My feeling is that if you're going to do a "natural" translation - that is:

String hello = "hello";
String world = "world";
printf("%s %s\n", hello, world);

...becomes...

System.out.println("Hello World");

then the "walking the AST" approach doesn't come close to working.
The two ASTs for those two code chunks
have almost nothing in common, and doing that translation
is 1% a "tree-manipulation" problem, and 99% a "code mapping" problem.

I think if tree-walking works for most of the translation work, you either
have two very similar languages, or your output code looks just like your
input code with different syntax. "I don't want 'JOBOL'", as one of my
customers said :)
Andy

>
> Monty
>



More information about the antlr-interest mailing list