[antlr-interest] philosophy about translation
antlr at jazillian.com
Fri Oct 6 07:29:23 PDT 2006
> An interesting and difficult problem..thanks for bringing this up.
> I'd have to think more. Clearly some kind of non-text data structure
> is needed for this. I guess you'd build the Java template or AST and
> then add the bits as you find them while traversing the COBOL.
This is the key to the difference in the two approaches. Using an AST, I
kept finding myself gathering bits of information from
around the AST. For example, say we're doing C to Java and I see "if
(a)". We first look for the declaration of a to see whether
it's an int or not (it may not be because our "goto removal" phase
already ran, and it injects booleans). Next, we look at
all references to "a", to see if it will be possible to change all of
them from "int" usages to "boolean" usages. If not, just change it
to "if (a != 0)", but if so, go ahead and change the type to boolean,
and make whatever changes are needed at each reference.
If you try to do that sort of thing in a tree-walking way, it will be a
mess, I think.
> My main point is that it's ok to have multiple tree structures, L and
> L', but the union and/or slow morphing of one to the other is a total
> pain I've found.
Yes, it's a royal pain, but if you start with the requirement that you
will produce "natural" code, there's no choice.
>> You might think "well, I can use multiple AST structures through
>> inheritence or heterogeneous trees", but that
>> just seems messy to me. I prefer an approach where you have, say, 100
>> phases. Each phase translates a small piece
>> (e.g. a single phase might handle the file-example above). So the code
>> gradually transforms from COBOL to Java,
>> one small step at a time.
> Yep, I just prefer collecting the info and sticking somewhere that
> doesn't force me to have different tree structures. A change in one
> phase has so many ripple-effect changes that can't be propagated
> manually. If grammar is the same throughout then you can auto-ripple
> changes to structure.
I think just this simple example that I brought up before actually
brings the problem to the surface:
String hello = "hello";
String world = "world";
printf("%s %s\n", hello, world);
I can't see how that can be done by treewalking. By the time the code is
actually written to implement "printf to System.out.println",
there will be almost no "tree-transformation" or even "tree walking" logic.
> What if we have COBOL AST to read from and Java AST to write to and
> update. THen we walk Java AST at end to find try/catch and import
Yea, now that I think harder about it, I guess try/catch and import
aren't good examples. By the time you've got a Java AST,
everything's pretty easy.
But look at it this way: on the "import", all the work is in knowing all
the built-in Java classes and
where they go (I created a doclet to gather that info), and combining
"java.io.IOException" and "java.io.InputStream"
into "java.io.*", and leaving out "java.lang", and stuff like that.
Whether the Java is in an AST or a sequence of Tokens
really doesn't matter one way or the other.
As for the try/catch, all the work is in finding a good "level" to
insert the try/catch. For example, if we have three consecutive
read() calls, best to put them into a single try/catch. If we need to
catch both FileNotFoundException and IOException
for one statement, and just IOException for the following statement,
what do we do?
My point being that almost all the logic (in my translator, maybe not
others') is non-tree-related, where it doesn't matter
much whether it's in an AST or a sequence of tokens.
> Thanks for your excellent problem statement!
Thanks for your patience - guess I'm a natural contrarian :)
"Better to be provocative than agreeable" is my new motto :)
More information about the antlr-interest