[antlr-interest] philosophy about translation

Wed Oct 11 12:10:36 PDT 2006

On 11. Oct 2006, at 20:29 Uhr, Andy Tripp wrote:

>  Go ahead and and put
> "Woods Eyes Masters" into a tree and then convert to Spanish.  
> You'll come back later
> and say "...but my program would have to know the context to even  
> see that it's talking
> about Tiger!" and I'll grin and say "that's right."

sorry, but I cannot refrain:

	Lexer	     ->	    Parser     ->		AST
"Woods Eyes Masters" -> NOUN VERB NOUN -> (VERB["Eyes"] SUBJECT 
["Woods"] OBJECT["Masters"])
	Lexer	     ->	 Token stream  ->	Translation ("Las maderas echan  
miradas a los amos")
"Woods Eyes Masters" -> NOUN VERB NOUN ->	Translation ("Las maderas  
echan miradas a los amos")

Even if you do not consider my horrible spanish, I fail to see how  
any approach could do this without knowing the context.

I agree with you that the mechanics of how you organize your  
translation become minor the farther you are into the project.
You will eventually have to build up a lot of support code to do the  
job (take a look at compilers, most often the actual
lexing/parsing is the least part in generating the output. Much more  
work has to be put into the type system, building up
graphs to do optimizations, semantic checks, checking for invalid  
operations etc.)

Don't you have to know the types of the variables used in the source  
and destination language? Can you actually do without
a type system and/or symbol table? I find it hard to picture doing  
the right thing without trees, but then again I might
be missing a lot.
I would expect to have the structure of the input (tree vs. flat  
stream) to not have much influence on your ability to produce
"natural" code. Both approaches force you to look all over the place  
to determine the usage of, say the malloc() family, e.g.
is it used to reallocing an array, to buffer up some characters etc.  
These would obviously be coded quite differently in Java.

When I see a "rule" like
	v1 = v1 + v2 => v1 += v2
I cannot help but seeing a tree. In the end, it's just another way to  
specify a transformation, is it not? I mean, what is the
fundamental difference? Tokens that are close to each other in a  
token stream most often end up close to each other in a tree for
some metric, aren't they? Ok, they might end up in different branches  
from a common interior node, but for really nasty stuff like
variable decl vs. usage you have symbol tables. I feel like I'm  
missing some important information here.

Like the others, I don't want to pick on your approach, but am very  
interested in seeing it from a different angle. Your success
with your translators points that you have found something that  
works, and seems to work quite well.
Though I must add, I'm also worrying about implicit loops in the  
rules. While they seem to be pretty simple algebraically, detecting
infinite loops is a surprisingly hard problem. Proving them to be  
well-formed and closed can be non-trivial to say the least.

cheers,

-k