[antlr-interest] starting with language translation

Thu Mar 13 07:37:06 PDT 2008

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of benzo
> Sent: Thursday, March 13, 2008 4:48 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] starting with language translation
> 
> hi
> this is my first post - thus "hello world!"
> 
> I'm interested in antlr since a long time and I finally like to make
> an attempt with it.
> What I'd like to achieve is to write a translator in form of "java to
> actionscript".
> 
> >From what I understand there are basically 4 layers/modules/steps
> involved:
> 1. parse of the source language (java)

Yes, but this splits into (in ANTLR terms):

a) Creating a character streams with ANTLRFileReader or String Reader;
b) Lexing that stream into tokens (all in one go)
c) Taking the token stream and parsing it ...

> 2. persist/serialize the structure

This is called producing an Abstract Syntax Tree. You place annotations in the parser (once you have it working) to say how you want the parser to produce the AST. This is a tree structure that encapsulates the 'program' you have just parsed and gets rid of/resolves ambiguities so that you have a nice easily walkable tree structure. 

> 4. apply the translation rules

For this, you can either manually walk through the tree or write a tree walking grammar. You walk through the nodes one or more times. It usually splits out into something like: a) A walk to verify the semantics of the input program (otherwise you will translate bad input to bad output). b) A walk to build a symbol table (you might do this in a) if the language allows. C) A walk to generate the target (your 5. Below). Some languages allow you to do all of this at once, some require more passes than this to resolve type information and so on.

> 5. generate the  the target language (actionscript)

Try to keep the code that actually generates code in separate code units and invoke it from your code generating tree walk(s).

> 
> some questions to these points:
> 
> 1. are there any existing java language parser (java grammar), maybe
> even with some (basic) documentation (tutorials, etc.)

Download the example grammars on the download page and have a read through them. At first, don; try and understand everything all at once, just take note of how things are plugged together and what the tree examples do to build the tree. The take one of the grammars (there is one for Java) and try to add something to it. Once you can do that, then you are on your way to a reasonable understanding. 

You are well advised to buy the ANTLR book: http://www.pragprog.com/news/new-in-print-and-shipping-the-definitive-antlr-reference-building-domain-specific-languages and to read up a little on how parsers and trees and related things work in general.

> 2. what do you suggest: writing the structure down to xml? persist in
> memory?

Use the AST building capabilities of ANTLR.

> 4. are there any good examples for doing that the right (clean) way?

The book and examples together, as well as using the search facility of the Wiki.

> 5. what is your personal favorite way doing that? stringtemplate? are
> there simpler methods?

Stringtemplate is fine for producing your final code, you can of course make a code generation interface, and invoke its methods from your AST walker.

File->charstream->lexer->tokenstream->parser->treenodestream->ASTparser (X n) ->code generator.

Just attack each one in stages and steal from the free examples :-)

Jim