[antlr-interest] philosophy about translation
parrt at cs.usfca.edu
Thu Oct 5 15:20:21 PDT 2006
On Oct 5, 2006, at 6:58 AM, Andy Tripp wrote:
> The problem with this approach is that it will be *very* difficult to
> work with only one tree structure. Say you're doing COBOL to
> Java (as I am). Sounds like you're saying that all your phases work
> on a
> "COBOL AST", and the last step takes
> the annotated "COBOL AST" and produces a "Java AST" (or just "Java
Java text via templates, though building a Java AST at the end would
work too. Mainly I'm trying to avoid "union grammars" that have to
have both languages in there and have phase n slightly different than
phase n+1 in tree structure.
> The problem is that a "COBOL AST" looks almost nothing
> like a "Java AST", and in the later phases, it will be nearly
> to do Java-like processing.
You can process the Java AST to find needed imports etc...
> For example, my last
> phase adds needed "import" statements by looking through the Java code
> to see what's needed. To find out what
> Java library classes are referenced, you really do need a copy of the
> actual Java code to analyze, not
> an annotated "COBOL AST".
You are correct. A good reason to build Java ASTs not text at the
end if you need to do this.
> This "single tree structure" approach might work if, at the start
> of the
> last phase,
> you essentially have the entire Java program existing in bits and
> as annotations on your COBOL tree. But I don't see how you could do
> that, because the tree structures are quite
> different. For example, In COBOL, you typically have a "variable
> representing a file" declared in one place, its mapping to
> a filename in another place, an "open" call in a third place. Those
> three should map to a single
> "FileReader reader = new FileReader("myfile");" Java statement.
An interesting and difficult problem..thanks for bringing this up.
I'd have to think more. Clearly some kind of non-text data structure
is needed for this. I guess you'd build the Java template or AST and
then add the bits as you find them while traversing the COBOL.
My main point is that it's ok to have multiple tree structures, L and
L', but the union and/or slow morphing of one to the other is a total
pain I've found.
> You might think "well, I can use multiple AST structures through
> inheritence or heterogeneous trees", but that
> just seems messy to me. I prefer an approach where you have, say, 100
> phases. Each phase translates a small piece
> (e.g. a single phase might handle the file-example above). So the code
> gradually transforms from COBOL to Java,
> one small step at a time.
Yep, I just prefer collecting the info and sticking somewhere that
doesn't force me to have different tree structures. A change in one
phase has so many ripple-effect changes that can't be propagated
manually. If grammar is the same throughout then you can auto-ripple
changes to structure.
What if we have COBOL AST to read from and Java AST to write to and
update. THen we walk Java AST at end to find try/catch and import
Thanks for your excellent problem statement!
More information about the antlr-interest