[antlr-interest] Re: Translators Should Use Tree Grammars

Tue Nov 23 13:50:52 PST 2004

atripp54321 wrote:
> 
> Anakreon,
> 
> Thanks for the help. Thanks to your code, I think I
> finally see why I'm seeing things differently....
> 
> You basically invoke code at the start and/or end of
> each visit to each tree node. For example, you have this
> in js_tree.g:
> 
>   | #(WHILE <pre_while> expr statement <post_while>)
> 
> ...and this in js_tree_php.act:
> 
> @pre_while : { incLabels(); }
> @post_while : { decLabels(); }
This is used for the break statement with a label.
In PHP is replaced with "break aNumber"
> 
> Most of these chunks of code are just a few lines, with
> a few a bit larger (@assign is 50 lines, for example).
> About 800 lines in total of automatically-fired-by-treewalker
> code.
> 
> Why Am I seeing things diffently from (most) everyone else here?
> Wthen I look at my rules, and ask "how would he do that?"
> and the answer is almost always "he wouldn't". My translator
> is not just translating the core language, but the core libraries
> too. And the translations are not just simple syntactics,
> they're high-level rewrites.
ASPA translates core libraries too. The built-in functions and classes
of Jscript and VBScript and ActiveX components are supported.
This should be obvious from the prior post containing the steps
to translate someString.length into strlen($someString).
The example I chose was deliberately simple.
> 
> Just to pick one example, many C functions return error codes.
> For example, fopen() returns a 0 if it can't open a file.
> That needs to be replaced with exception handling in Java.
> So I have a list of the functions and the values they return
> on error. I check for calls to the functions, and look for
> various patterns of error checking, such as:
> -----------------------------
> if (fopen(xxx) == 0) { // return value checked
> // error code
> }
> else {
> // non-error code
> }
> -----------------------------
> v = fopen(xxx); // return value stored and checked later
> ...
> if (v != 0) {
> // non-error code
> }
> -----------------------------
This is in my TODO list for ASPA.
I had limited time for the development (was an assignment
from the university) and didn't consider this issue
as top priority.
I have some ideas of how to implement this, but will check
the way you handled this problem for inspiration.
> 
> And once I've found one of these patterns, I store the
> "error code" and "non-error code" (both of which may be more
> than just an AST, they are a sequence of statements),
> and produce the corresponding try/catch block in Java.
> And, if there are several statements that may throw
> the same exception, we don't want this:
> try {
>   open();
> } catch(IOException e) {
>   // error code
> }
> try {
>   read();
> } catch(IOException e) {
>   // error code again!
> }
> 
> So, I've got to do some real work to figure out where to
> put the try/catch stuff.
> 
> Correct me if I'm wrong, but I don't think your translator
> is doing anything nearly that complex. I have many simple
I can't tell if it does something this complex. One complex
translation is the example I sent about class definition in
JScript. The problem of error handling is not implemented
but I don't think that is not achievable with TreeParser.
I just did not have enough time to implement it. 
> syntactic rules as you do, but I also have many complex 
> rules like this one.
> 
> So, now that I think about it,
> maybe even this one rule involves several things that you probably
> wouldn't see in your typical language-to-language translator:
> 
> * handling of library functions, not just core language
This is done in ASPA.
> * replacing whole mechanisms/paradigms (error codes from 
>   library functions being replaced by exception handling)
No
> * complex pattern matching (e.g. checking for various comparisons
>   the return value like ==, !=, <, etc. and even checking for
>   storage in a variable and then usage of that variable)
I don't understand the point about the operators.
Operators which are handled differently then others in ASPA
are:
JScript: The '+' operator. If the operands are numbers the translation
is '+', otherwise the operator is considered to perform string concatenation.
VB: The logical operators (and, or, not ...) if have int operands are thought
to perform bitwise operations, and are handled properly.
> 
> In case you think that this rule is just an exceptionally
> complex one, here are a few other examples:
> 
> * structs, unions, and enums become whole Java classes, including
>   constructors and changes at each reference
The same is true for jscript classes. The functions are placed inside
the class body, if the variables(members) are defined and assigned a value
in the PHP class only the declaration exists and the members are initialized
inside the constructor body, etc  
> * memory management is done "by hand" in C must be changed to
>   use Java objects.
This is a problem specific to your project. A complex problem
indeed.
> * I handle multiple input files, and change C file names
>   to Java ones (including combining "hello.c" and "hello.h" into
>   "Hello.java"
This is true for ASPA too. Other files are included in a file
with the #include directive.
Also, because ASP contains two languages, there exists a mechanism
to share information among different parsers and TreePasers, but this
is specific to my application. 
> * There are different rules in Java and C for where an array
>   can be initialized.
> * The syntax and semantics for array declaration are different
>   (In C, it's "struct person a[3];", in Java it's 
>    "Person a[] = new Person[3];" plus a loop to initialize it)
None of this problems are unachievable with a TreeParser in my opinion.
A similar issue about arrays exists in VB.
Example:
dim a(12, 3) 'multi dimension array
a(0, 1) = 0
a(0, 2) = 2
PHP:
$a = array(array()); //emulate multi dimension arrays with nested ones
$a[0][0] = 1;
$a[0][1] = 2;
I give this example to illustrate that the 2 problems you refer
are application specific, but a similar problem can be solved
with the TreeParser approach.
> Now I'm really starting to wonder about how much all the
> language-to-language translators out there are really doing.
> I know for a fact that the C-to-Java ones (other than Jazillian)
> are only doing trivial syntactic changes
> (see http://jazillian.com/competition.html for details).
It should feel good to know that you have built the best tool
available for a problem.

What this thread should make clear is that there are many
approaches to solve the same domain of problems. I consider
your work important because offerers an alternative methodology
for solving the translation problem. But personally I prefer
the TreeParser methodology. You point that there are 800 lines
of code fired by the treewalker. I don't know, because never counted
them. Anyway the aid the treewalker offered for me was that it would
fire the appropriate method and I didn't have to worry about it.
For example in the rule:
#(DOT (d1:expression | dt:type | dthis:THIS) d2:expression <dot>)
the code:
    if (#d1 != null) { //expression.expression
        #expression = resolveDot(#d1, #d2);
    } else if (#dt != null) { //Array.prototype
        #expression = resolveClassAttribute(#dt, #d2);
    } else { //this.something
        #expression = resolveObjectAttribute(#d2);
    }
is fired.
This is very important because from the grammar definition
I can be sure that only those 3 cases can occur. Additionally
it was simple to model the methods which should offer the functionality
required for each case. I did not have to worry about how the expressions
d1 and d2 should be translated because the treewalker had previously fired
the appropriate code sections to take care of that.
So the problem is partitioned effectively into smaller problems by the treewalker
for me. This is the greatest benefit from the TreeParser in my opinion.
It doesn't matter if there are 800 lines of code or less activated by the
treewalker. What matters is that the treewalker offers entry points for specific
sub-problems. 

Anyway, I think that the choice of methodology (treewalker or other)
is up to the developer. It can just be a matter of taste if all
the available methodologies can offer a solution to the problem.

What ASPA (and other TreeParser based translators) and jazillian
prove is that both methodologies are capable to offer a solution.
> 
> What's the most complex translator that that people
> have seen? (Complex meaning functionality, not internals).
I can't tell.

Anakreon

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/