[antlr-interest] Jazillian C to Java translator

=?GB2312?B?0MLC8g==?= inshua at gmail.com
Fri Dec 16 06:24:42 PST 2005


Hi , Andy

it's a great work you make, very nice.

I'm doing similiar work, .... may you spend some time to make a demo of
translating for newer? such as me. :)

a simple demo as translating

1+1

to

1 add 1

thank you very much~~

2005/12/16, Andy Tripp <atripp at jazillian.com>:
>
> Hi guys,
>
> Thanks for your interest in Jazillian.
> This page: http://www.jazillian.com/does.html gives a flavor for what it
> does, at a high level.
> This page: http://www.jazillian.com/how.html gives a fairly complete
> list of the various transformation
> rules.
>
> As for a design/implementation document, I don't have a complete one yet
> so I'll try to explain a bit here.
> I apply a series of about 150 transformation rules. Each rule consists
> of a match() method that
> does a pattern match, and an apply() with does the transformation. For
> example, the PrintfRule.match()
> method just looks for a "printf" token, and the PrintfRule.apply()
> method replaces all the tokens in the
> printf statement with tokens for an equivalent Java System.out.println()
> call.
>
> I have a full class hierarchy of rules. At the top are various abstract
> classes, but also things like
> FileRenameRule, which just renames files to match Java naming
> conventions. At the bottom of the
> rule hierarchy are things like SnippetsRule, which reads a file
> containing simple text transformation rules like:
>
> NULL --> 0   // replace NULL with 0
> unsigned -->         // Java has no unsigned keyword, just delete it
> v1 = sizeof v2 / sizeof (char *); --> v1 = v2.length;  // just one of
> many ways to get string length in C
>
> Note that the pattern matcher allows placeholders such as "v1" and "v2"
> above, for which any variable
> can be matched. There is also an "x" placeholder which will match
> anything:
> if (false) { x } -->      // just delete any code that will never get
> used.
>
> In addition to the 150 rules written in Java, there are hundreds of
> these simple pattern-match-and-replace rules.
> All of this is done with streams of tokens on a set of C files. I have
> built a library of various functions for
> working with these token streams for:
>
> matching patterns and doing replacements, inserts, deletes, moves
> symbol table stuff: finding variable declarations, types, references, etc.
> All the stuff I would have gotten for free if I where doing the
> traditional tree-transform approach: function scopes,
> start and end of variables and statements, etc
>
> I did pay a heavy price to implement my transformations on token streams
> rather than ASTs, but I am still happy
> with the result. For example, I have an elaborate isDeclaration() method
> that tries to figure out if a given token is
> a variable or function declaration by examining nearby tokens. The
> payoff is that I can simply write hundreds of
> rules like the ones above without having to think about ASTs.
>
> My GotoRemoverRule is just an implementation of one of the well-known
> goto removal algorithms, adopted to use
> my library of token-stream functions. I handle preprocessor stuff by
> cheating: I've added rules to Monty's
> cgram grammar to handle cpp constructs, but only when they appear where
> people usually put them. In this way,
> I can be smart and make the same changes a human would do (as opposed to
> forcing a user to run cpp before running
> Jazillian). So I can make changes like:
> #define X 1    becomes   public final static int X = 1;
> #define f(x) x+1   becomes  static int f(int x) { return x+1}
>
> (This sharp crowd is now shaking their collective head. "How do you even
> know that x is an int? This
> macro works on any type!" Answer: I looked at every reference of x. "But
> what if you weren't so lucky
> and  you need a "float" version of f? Answer: I'm doing the best I can.
> I either handle that today, or
> will get to it soon :)
>
> C structs, unions, and enums become their own Java classes. I generate
> Java 1.4 code - not generating
> any 1.5 stuff (like enums) yet. I do no magical "make it object
> oriented" tricks: each C file becomes a Java
> class. And I generate the code that a human would generate. For example:
> printf("i=%d  s=%s\n");   becomes System.out.println("i=" + i + " s=" +
> s);
> (i.e. I see that you're ending with a newline, and so calling println. I
> will never generate a Java printf call.
> That would work, but it's not what a real person would do).
>
> Pointers are the toughest issue. I basically analyze all pointer usage,
> looking for the three most common
> usages of pointers:
> As an array index (in which case, I change the pointer to a Java int,
> and make the pointee an array
> As some sort of pointer-based data structure (linked list, tree,
> whatever), which requires just minor syntax changes
> As a way of doing pass-by-reference
>
> I use ANTLR for all C lexing. The only place I use ANTLR for parsing and
> AST generation is for handling
> expressions. Java's boolean type is required in certain places, where C
> programs use an int because
> there's no built-in boolean type. So when we have "if (x)", I really
> have no choice other than to go ahead
> and build an AST from "x" (which could be a large, complex expression),
> and then walk thru the AST
> replacing various "int" nodes with "boolean" ones.
>
> I guess that's it off the top of my head. I'll be happy to give more
> specifics if you have questions.
> I'm currently working on a C++ version. I'm proud still have much of my
> sanity left after trying to map
> templates to generics. That's not to say that I'm ever going to get it
> to work, but I see keeping my sanity
> as a big plus :)
>
> Andy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20051216/557ac7f5/attachment-0001.html


More information about the antlr-interest mailing list