[antlr-interest] philosophy about translation

Terence Parr parrt at cs.usfca.edu
Thu Oct 5 15:20:21 PDT 2006


On Oct 5, 2006, at 6:58 AM, Andy Tripp wrote:
> The problem with this approach is that it will be *very* difficult to
> work with only one tree structure. Say you're doing COBOL to
> Java (as I am). Sounds like you're saying that all your phases work  
> on a
> "COBOL AST", and the last step takes
> the annotated "COBOL AST" and produces a "Java AST" (or just "Java
> text").

Java text via templates, though building a Java AST at the end would  
work too.  Mainly I'm trying to avoid "union grammars" that have to  
have both languages in there and have phase n slightly different than  
phase n+1 in tree structure.

> The problem is that a "COBOL AST" looks almost nothing
> like a "Java AST", and in the later phases, it will be nearly  
> impossible
> to do Java-like processing.

You can process the Java AST to find needed imports etc...

> For example, my last
> phase adds needed "import" statements by looking through the Java code
> to see what's needed. To find out what
> Java library classes are referenced, you really do need a copy of the
> actual Java code to analyze, not
> an annotated "COBOL AST".

You are correct.  A good reason to build Java ASTs not text at the  
end if you need to do this.

> This "single tree structure" approach might work if, at the start  
> of the
> last phase,
> you essentially have the entire Java program existing in bits and  
> pieces
> as annotations on your COBOL tree. But I don't see how you could do
> that, because the tree structures are quite
> different. For example, In COBOL, you typically have a "variable
> representing a file" declared in one place, its mapping to
> a filename in another place, an "open" call in a third place. Those
> three should map to a single
> "FileReader reader = new FileReader("myfile");" Java statement.

An interesting and difficult problem..thanks for bringing this up.   
I'd have to think more.  Clearly some kind of non-text data structure  
is needed for this.  I guess you'd build the Java template or AST and  
then add the bits as you find them while traversing the COBOL.

My main point is that it's ok to have multiple tree structures, L and  
L', but the union and/or slow morphing of one to the other is a total  
pain I've found.

> You might think "well, I can use multiple AST structures through
> inheritence or heterogeneous trees", but that
> just seems messy to me. I prefer an approach where you have, say, 100
> phases. Each phase translates a small piece
> (e.g. a single phase might handle the file-example above). So the code
> gradually transforms from COBOL to Java,
> one small step at a time.

Yep, I just prefer collecting the info and sticking somewhere that  
doesn't force me to have different tree structures.  A change in one  
phase has so many ripple-effect changes that can't be propagated  
manually.  If grammar is the same throughout then you can auto-ripple  
changes to structure.

What if we have COBOL AST to read from and Java AST to write to and  
update.  THen we walk Java AST at end to find try/catch and import  
needs?

Thanks for your excellent problem statement!

Ter



More information about the antlr-interest mailing list