[antlr-interest] philosophy about translation

Monty Zukowski monty at codetransform.com
Thu Oct 5 12:25:38 PDT 2006


On 10/5/06, Andy Tripp <antlr at jazillian.com> wrote:

> The problem with this approach is that it will be *very* difficult to
> work with only one tree structure. Say you're doing COBOL to
> Java (as I am). Sounds like you're saying that all your phases work on a
> "COBOL AST", and the last step takes
> the annotated "COBOL AST" and produces a "Java AST" (or just "Java
> text"). The problem is that a "COBOL AST" looks almost nothing
> like a "Java AST", and in the later phases, it will be nearly impossible
> to do Java-like processing. For example, my last
> phase adds needed "import" statements by looking through the Java code
> to see what's needed. To find out what
> Java library classes are referenced, you really do need a copy of the
> actual Java code to analyze, not
> an annotated "COBOL AST". Or how about adding try/catch blocks as
> needed. Here, you need to not only look
> for references to methods that throw non-Runtime exceptions, but need to
> see whether exceptions are already being
> caught.
>
> This "single tree structure" approach might work if, at the start of the
> last phase,
> you essentially have the entire Java program existing in bits and pieces
> as annotations on your COBOL tree. But I don't see how you could do
> that, because the tree structures are quite
> different. For example, In COBOL, you typically have a "variable
> representing a file" declared in one place, its mapping to
> a filename in another place, an "open" call in a third place. Those
> three should map to a single
> "FileReader reader = new FileReader("myfile");" Java statement.
>
> You might think "well, I can use multiple AST structures through
> inheritence or heterogeneous trees", but that
> just seems messy to me. I prefer an approach where you have, say, 100
> phases. Each phase translates a small piece
> (e.g. a single phase might handle the file-example above). So the code
> gradually transforms from COBOL to Java,
> one small step at a time.
>

In fact it is quite easy to have multiple languages in the same tree.
I did this for the AREV->VB translator.  The key is to have AREV_PLUS
and VB_PLUS for the '+' tree nodes.  Then you don't have to guess if
you are processing arev addition or vb addition.

I had one super-treegrammar with two complete tree grammars therin.  I
found it convenient to bifurcate at the statement level.  For example:

program: (statement)*;
statement:arevStatement | vbStatement;
...
That both types of statements could co-exist in the same tree, and
even have different types of sub-statements.  Similarly for
expressions--an expression could use either language's operators, and
I could have passes that just dealt with arithmetic or string handling
or whatever, so that in one pass expressions are all arev the next
would have vb arithmetic and arev everything else, etc.

I was raving about this like 7 years ago, it just totally rocks!
Check the archives for my posts about multiple tree grammars, or ask
questions if something isn't clear.

By the last pass, I had a completely vb tree, and then I finally
dumped it to text.

Monty


More information about the antlr-interest mailing list