[antlr-interest] philosophy about translation

Andy Tripp antlr at jazillian.com
Thu Oct 5 06:58:39 PDT 2006


>
> Here is my new philosophy about translation:
>
> Language L to L':  build a single tree structure and have multiple  
> tree phases that use the same grammar but different actions.  Or, as  
> we are discussing, you could have something that would skip certain  
> pieces that you don't care about.  All of the phases up until last  
> one will simply collect information, possibly annotating the tree  
> nodes as well.  The last phase walks the tree grammar generating  
> string templates that get put together and yield eventually one big  
> string.  This is the approach I'm taking for Mantra.

>
> Ter

The problem with this approach is that it will be *very* difficult to
work with only one tree structure. Say you're doing COBOL to
Java (as I am). Sounds like you're saying that all your phases work on a
"COBOL AST", and the last step takes
the annotated "COBOL AST" and produces a "Java AST" (or just "Java
text"). The problem is that a "COBOL AST" looks almost nothing
like a "Java AST", and in the later phases, it will be nearly impossible
to do Java-like processing. For example, my last
phase adds needed "import" statements by looking through the Java code
to see what's needed. To find out what
Java library classes are referenced, you really do need a copy of the
actual Java code to analyze, not
an annotated "COBOL AST". Or how about adding try/catch blocks as
needed. Here, you need to not only look
for references to methods that throw non-Runtime exceptions, but need to
see whether exceptions are already being
caught.

This "single tree structure" approach might work if, at the start of the
last phase,
you essentially have the entire Java program existing in bits and pieces
as annotations on your COBOL tree. But I don't see how you could do
that, because the tree structures are quite
different. For example, In COBOL, you typically have a "variable
representing a file" declared in one place, its mapping to
a filename in another place, an "open" call in a third place. Those
three should map to a single
"FileReader reader = new FileReader("myfile");" Java statement.

You might think "well, I can use multiple AST structures through
inheritence or heterogeneous trees", but that
just seems messy to me. I prefer an approach where you have, say, 100
phases. Each phase translates a small piece
(e.g. a single phase might handle the file-example above). So the code
gradually transforms from COBOL to Java,
one small step at a time.

Andy





More information about the antlr-interest mailing list