[antlr-interest] philosophy about translation

Sat Oct 7 10:32:58 PDT 2006

Terence Parr wrote:

>
> On Oct 6, 2006, at 7:29 AM, Andy Tripp wrote:
>
>>
>>>
>>> An interesting and difficult problem..thanks for bringing this  
>>> up.   I'd have to think more.  Clearly some kind of non-text data  
>>> structure  is needed for this.  I guess you'd build the Java  
>>> template or AST and  then add the bits as you find them while  
>>> traversing the COBOL.
>>
>>
>> This is the key to the difference in the two approaches. Using an  
>> AST, I kept finding myself gathering bits of information from
>> around the AST. For example, say we're doing C to Java and I see  "if 
>> (a)". We first look for the declaration of a to see whether
>> it's an int or not (it may not be because our "goto removal" phase  
>> already ran, and it injects booleans). Next, we look at
>> all references to "a", to see if it will be possible to change all  
>> of them from "int" usages to "boolean" usages. If not, just change it
>> to "if (a != 0)", but if so, go ahead and change the type to  
>> boolean, and make whatever changes are needed at each reference.
>>
>> If you try to do that sort of thing in a tree-walking way, it will  
>> be a mess, I think.
>
>
>   Aren't these standard operations and data structures?    Symbol  
> table, use-def chains, flow analysis.  The tree walk can simply ask  
> questions of these data structures.

Yes. Sorry. My mistake. I shouldn't have said the treewalker way would
be a mess, just
that I don't think it's helping. So the treewalk approach looks
something like:

void doTheCheck() {
// here I am at the conditional node of an "if", "while", or "for"
if (symbolTable.getType(varName) != "boolean") {
   boolean canBeChanged = true;
   for (Use use: UseDefChains.getAllUsagesOf(varName)) {
         if (!use.canBeChangedToBoolean()) {
             canBeChanged = false;
        }
    }
   if (canBeChanged) {
      SymbolTable.changeType(varName, "boolean");
      for (Use use: UseDefChains.getAllUsagesOf(varName)) {
           use.change();
     }
   }
}
}

The thing is, you'll need to write this code whether you're doing
treewalking or "token-stream-matching".
All the treewalking does for you is to have this code kick-in in the
right place. And that's just as easy to
do with "token-stream-matching". In my case, it looks something like:

public boolean match(Source source) {
    if (source.currentToken.getText().equals("if") ||
source.currentToken.getText().equals("while")) {
       doTheCheck(source.getTokenAt(+2));
  } else if (source.currentToken.getText().equals("for")) {
       Token semi = source.findToken(";");
       doTheCheck(source.getTokenAt(semi, +1));
  }
}

>
>>> My main point is that it's ok to have multiple tree structures, L  
>>> and  L', but the union and/or slow morphing of one to the other is  
>>> a total  pain I've found.
>>
>>
>> Yes, it's a royal pain, but if you start with the requirement that  
>> you will produce "natural" code, there's no choice.
>
>
>   Well, I suppose anything is possible, it's just a matter of how  
> convenient.  And you are saying it's inconvenient really not impossible.

Right. I found I just couldn't get cranking on writing transformation
"rules" because I was spending all my time
trying to picture AST mappings in my head. Spending all my time on "how"
to do it, not "what" do to.

>
>> I think just this simple example that I brought up before actually  
>> brings the problem to the surface:
>>
>> String hello = "hello";
>> String world = "world";
>> printf("%s %s\n", hello, world);
>>
>> ...becomes...
>>
>> System.out.println("Hello World");
>>
>> I can't see how that can be done by treewalking. By the time the  
>> code is actually written to implement "printf to System.out.println",
>> there will be almost no "tree-transformation" or even "tree  walking" 
>> logic.
>
>
>   The logic is identifying that you have entered a list of  statements 
> and you see a print statement.  The translation logic is a  simple 
> one-to-one mapping from printf to println just as you would do  in a 
> rule right?  

It may map to println() or print(), and I don't recall whether I
actually may combine several printf() calls into a single println() call,
but that's certainly the type of thing that I do.

> The only problem is discovering what should be the  expression.  
> Either in a previous phase I would have done constant  propagation or 
> in your case you ask the question or something in one  of your 
> declaration rules.  Do you insert actions in your rules to go  check 
> data structures?  Surely you don't write a rule that has a   
> context-sensitive pattern  asking if there have been all possible  
> variable declarations before the print, right?

Right, I just "write code" to go ahead and parse the format string,
match it up with the printf() arguments, and replace it
all with a single "+"-separated Java expression.

>
>> As for the try/catch, all the work is in finding a good "level" to  
>> insert the try/catch. For example, if we have three consecutive
>> read() calls, best to put them into a single try/catch. If we need  
>> to catch both FileNotFoundException and IOException
>> for one statement, and just IOException for the following  statement, 
>> what do we do?
>
>
>  how do you handle that?  I very is to to learn more about your  
> approach; I see you talking about how the tree walking won't work,  
> but I don't see how yours will work.  It is very interesting and I  
> want to learn more.

I just decide what the most "natural" thing to do is, and then "just
write code" to do it. I can't remember exactly how I
do it for exceptions, but I'd do something like "If two consecutive
calls throw exactly the same set of exceptions, put
them both in the same try block, otherwise, each call is in its own try
block". Then, if that produces fairly ugly code in
lots of cases, I might make it more complicated.

>
>> Thanks for your patience - guess I'm a natural contrarian :)
>
>
>   Always good for an excellent discussion and to shake things up...
> Ter
>
Thanks,
Andy