[antlr-interest] Re: Translators Should Use Tree Grammars

Wed Nov 24 06:00:41 PST 2004

atripp54321 wrote:
> 
> 
>>ASPA translates core libraries too. The built-in functions and classes
>>of Jscript and VBScript and ActiveX components are supported.
>>This should be obvious from the prior post containing the steps
>>to translate someString.length into strlen($someString).
> 
> 
> I looked again and I don't see anything that it's doing that's
> complex.
> changing "someString.length" to "strlen(#someString)" is
> pretty simple. I realize you support more complex stuff,
> so could you give me an example?
Well, this transformation is not hard coded. The application knows
nothing about the String class. The information is provided via the
xml files. I have described the method of translation so far. 
> 
> One example that I do:
> printf("i=%d c=%c\n", i, c);
> ...becomes...
> System.out.println("i=" + i + " c=" + c);
I did not have to make this kind of translations in ASP.
One somehow similar case is the opening of a connection
from the ADODB library.
con.open "Connection String"
I chose to transform this into openConnection($con, "Conection String").
The method openConnection is defined in PHP in a file I called runtime.php
which will do the appropriate actions to get a connection based on that
string. 
> So I parse the format string, check for "\n" at the end, 
> and replace the various placeholders ("%d" and "%c") with
> arguments (i and c), using the "+" operator. How would I
> specify that (or something else that complicated) using
> your system?
This complex? no.
The values of literal strings do not influence translation decisions
in ASPA.
> Hmm...anyone have a feel for the size of the
> ASP libraries vs. the C libraries
> (http://www.gnu.org/software/libc/manual/html_node/)?
Why size would be a problem? One assumption made for ASPA
is that any method or property an ActiveX or built-in method
can be emulated with a PHP code block or method call.
ASPA offerers a way to bind an ASP method to a PHP method
or code block. If someone wishes to support yet an other
ActiveX he just have to write one more XML file.

> My point was that when looking for various patterns of
> checking for error conditions in C code, you've
> got to also check for things like
> if (fopen < 0)
> if (fopen != 0)
> 
> ...and not just
> if (fopen == 0).
> that's all.
> 
I see. Error handling is not implemented.
>>The same is true for jscript classes. The functions are placed inside
>>the class body, if the variables(members) are defined and assigned a
> 
> value
> 
>>in the PHP class only the declaration exists and the members are
> 
> initialized
> 
>>inside the constructor body, etc 
> 
> 
> Can you give me an example of where in your code you produce
> constructor code?
>  
Look at gr.omadak.leviathan.asp.objects.JsUserDefinedMethod and also
the rule @end in tree_js_php.act and the method transformToClass.
You might argue that the transformation is done with hand-written code.
But the information the code I wrote is provided by the treewalker
as the tree was traversed. 
> 
>>>* I handle multiple input files, and change C file names
>>>  to Java ones (including combining "hello.c" and "hello.h" into
>>>  "Hello.java"
>>
>>This is true for ASPA too. Other files are included in a file
>>with the #include directive.
> 
> 
> And I assume that variables declared in one file can be referenced
> in another?
Yes.
> Do you put all files in one large AST?
No
> If not,
> how do you handle moving things from one AST to another?
Nothing is moved. For each ASP file a file with PHP code is generated.
But the symbol table of the TreeParser of the included file is available
to the TreeParser of the file which included it. Only info are moved
not the variables them self.
> Sure, but what I'm trying to do is to bring up the things that
> I do that don't seem (to me) like they'd be easy using a
> treewalker, and asking you (or anyone else) to explain how
> you use a treewalker to implement them. You mentioned
> that you use a treewalker, so I'm still struggling to
> understand how you use it for anything nontrivial.
Trivial is a subjective concept. What is nontrivial for
me might be trivial for you and vice versa.
Perhaps you could see the code generated for each
of the files available in the tests/sources directory and
decide if the transformations are trivial.
If you are using Linux, from the ASPA directory type:
./parse.sh -b tests -o out
and the translated files will be placed inside the "out" dir.
Some of the files will not have equivalent php files because
they where written in order to examine the behaver upon failure.
>>What this thread should make clear is that there are many
>>approaches to solve the same domain of problems. 
> 
> 
> I'm sorry to be so hardheaded and I mean no disrespect, but
> I need to understand more about what you're doing to
> convince myself that we really are dealing with the same domain.
It's OK.
I think the domain is source->source translation.
> 
> If you could just briefly describe, say, your most complex
> transformation, and point me to the right place in your
> code so I can investigate more, I'd appreciate it.
As I wrote before, complex and trivial is somehow subjective.
I consider the examples below as complex transformations:
1)ActiveX which contain Collections
	a = Request.Cookies 'this provides the raw cookies string
	a = Request.Cookies("key") 'the value of a key
	a = Request.Cookies("key")("subkey")
The problem was that in the first case Cookies seems like
a property of the Request class.
In the second like a method and in the third ??
The problem was solved by defining nested classes inside the Request
class (I am referring to the xml file) which provide default properties.   
2)The transformation of Jscript classes in PHP classes.
3)If a variable used inside a function exists outside of it's scope
it should be defined with the keyword "global" in php. The "algorithm"
is described in the file etc/function_notes.txt

> Hmm...so you're saying "here are all the transformations
> that need to be done whenever I encounter a DOT inside an
> expression".
Well yes. Note however that the transformation decisions
are influenced by the code which was parsed before the
DOT was encountered. If we have a.length, but "a" is not
a reference to an instance of a class which does not have
a property called length, it would be an error.   

> Boy, it really is hard for me to think about
> things that way, I'm so used to my other way. Let me
> see if I can think of some of the situations that I
> handle that deal with a DOT inside an expression.
The way of thinking is a matter of idiosynkrasi (I hope
this is the English word)
> 
> Here's one: in C, a field can be a function pointer, so you
> can have a function call: "a.f()" (the syntax is far
> more complicated, but if I have to write it down,
> I'll throw up...but you get the idea). 
> Java doesn't have function pointers,
> so I check for various function-pointer patterns
>  and replace all the function pointer
> fields and use Java reflection in its place. It's all very
> involved. 
Hard problem but application specific. One simpler check
which ASPA performs on variable usage is for arrays in VB.
If we have the code "c = a(expr)" this could be a call to
method "a" or the element with index "expr" if "a" is an
array. So APSA has to know if "a" was declared as an array
or the result of a method which returns an array was assigned
to identifier "a".  
> 
> Here's another usage of DOT: one rule might replace a sort
> by a call to "Collections.sort(a)". I have a rule that
> looks for usage of classes (such as Collections) that
> require Java "import" statements. Do I really want to do
An import may be required for the translated code in ASPA
too. In the XML file, a requirement for an import can be defined
for a class (if any member of the class is called an require 'file'
statement is generated), for a member (only if the member is called
the 'require' is generated) or a function. 
> a check at the DOT node that says "if the left side
> is the name of a class, then import that class"?
> That logic has nothing at all to do with the
> function-pointer logic - why should they be in
> the same place in the code? Just because they both
> happen to involve a DOT?
This is a matter of code organization. What I do
is to call a method which then calls other methods
to handle cases of a different logic. They are not
in the same place.
> 
> I would
> guess there are a few more situations where I have to
> do various transformations involving DOT. If so, I'd
> have to add even more unrelated cases to this one "DOT" place in
> the code. That's slicing the problem the wrong way.
How can we discuss the right way of slicing a problem
in an objective way?
> 
> 
>>This is very important because from the grammar definition
>>I can be sure that only those 3 cases can occur.
> 
> 
> Yea, I can appreciate that. You're sure you've handled
> every possible input. But on the other hand, you're not
> at all sure that you've handled all the cases.
> For example, you surely have one place in the grammar
> that handles the "+" binary operator. You know you
> have things covered by covering that one case with, say,
> a call to a handleBinaryPlus() method. But what does that
> method do? Does it:
> * remove redundant zeros (x+0 and 0+x become x)
> * simplify expressions (x + -1 becomes x - 1)
No. ASPA is not a code optimizer but a translator.
> * record the fact that each operand is involved in an arithmatic
>   operation (and thus better not get it's type changed to boolean)
ASP and PHP(fortunately) are loosely typed languages. But still some
translating decisions are based on the types of the operands.
The way to handle this is based on heuristic methods and is not guarantied
to be always successful.
> * combine consecutive string concatenation where possible ("a" + "b" 
>    becomes just "ab")
This can be done but I didn't do it. An other thing are expressions
like
a = " there"
Response.write "hello" & a 'prints hello there
which in PHP can be simplified into:
print("hello$a")
I didn't do this because I'm not sure a user of the
program would think of it as a feature. He could prefer
to have print("hello" . $a) instead. But it can be easily done.

> Isn't it more natural to have a separate rule for each of the
> above items? That way, 1) we avoid having this handleBinaryPlus()
> method performing 4 completely unrelated functions, and 2)
> we avoid having the "change x from int to boolean" logic
> split across handleBinaryPlus() and other functions.
A matter of taste I'm afraid.

> I just don't see the advantage of this "fire a rule at each
> node" approach. As I look through my rules, almost none
> of them involve a single node in the tree.
Example:
a = "one" + true + "two" + new Date() + "three" + 5
(EXPR (ASSIGN a (PLUS (PLUS (PLUS (PLUS (PLUS one true) two) (NEW DATE ELIST)) three) 5)
The translated one:
(EXPR (ASSIGN a (CONCAT (CONCAT (CONCAT (CONCAT (CONCAT one true) two) [METHOD_CALL, getdate]) three) 5)
Except for the deepest PLUS, there are single node operands involved.
But each time PLUS rule is called it only cares about the [guessed] type
of it's operands. 
> And even when 
> rules do involve a single node, I don't want to mix
> them together. For example, one rule removes the "u"
> in "123u" (java doesn't have unsigned types). And another
> removes the "L" in "123L" (because a C long is [usually]
> an Java int). Yes, I can have one handleNumber() method
> fire at the NUMBER node that does both of these. But
> I'd rather not slice the problem that way. Instead, I'd
> like the NumberWithURule to traverse the AST and make its
> changes, and the NumberWithLRule to traverse the AST and
> make its changes.
ASPA is single pass, but you could do those transformations
mentioned above in many passes.

> My point is that as that 800 lines grows to tens of thousands
> of lines, most of the code will start to deal with whole sections
> of the tree rather than individual nodes.
My experience does not confirm that.
The code deals with individual nodes and any other information
stored about those nodes. The information is gathered from preceding
code by examining individual nodes.

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/