[antlr-interest] AST rewrite "requirements"

Thu Jul 24 13:18:51 PDT 2008

AST-rewrite fans,

As Terence and Leon Su are working on a new term-rewriting system, I'd
like to approach the same problem from the other end. I want to define
what features I think are needed for a term-rewrite system to be
useful for certain translation projects. By "certain", I mean 
translators that do things like C-to-Java, C++-to-Java, and COBOL-to-Java.

So I've started by taking one of my current "translation rules", and
writing down pseudocode (actually, fairly reasonable Java code) that
does what needs to be done. If people find this useful, then I'll repeat this
for the many other "rules".

Each "rule" modifies an AST in some way.
The "rule" that I chose modifies a C++/Java AST to change the
 access modifiers ("public", "private", "protected") from being "C++ - like"
to "Java like". For example, this C++ code... 

class C {
    int a;
    public:
        int b;
        int c;
    private:
        int d;
}

...might produce a "C++-ish" AST that looks like this...

CLASS_SPECIFIER "C"
    DECL
        TYPE
            LITERAL_int 
        NAME
            ID "a"
    DECL 
        MODIFIERS
            LITERAL_public 
        TYPE
            LITERAL_int 
        NAME
            ID "b"
    DECL 
        TYPE
            LITERAL_int 
        NAME
            ID "c"
    DECL 
        MODIFIERS
            LITERAL_private 
        TYPE
            LITERAL_int 
        NAME
            ID "d"

...and we want to transform that AST to this "Java-ish" AST...

CLASS_SPECIFIER "C"
    DECL
        MODIFIERS
            LITERAL_private
        TYPE 
            LITERAL_int 
        NAME
            ID "a"
    DECL 
        MODIFIERS
            LITERAL_public 
        TYPE
            LITERAL_int 
        NAME
            ID "b"
    DECL 
        MODIFIERS
            LITERAL_public 
        TYPE
            LITERAL_int 
        NAME
            ID "c"
    DECL 
        MODIFIERS
            LITERAL_private 
        TYPE
            LITERAL_int 
        NAME
            ID "d"

...which will produce this Java code...
class C {
    private int a;
    public int b;
    public int c;
    private int d;
}

I won't cover the "big picture" here, all I want to do is to
outline what logic is needed to make the AST transformation that
we see above.

So here is some rough Java code that would do what needs to be done:

    private final static List<Integer> ALL_C_MODIFIERS = ASTFactory.createList(
        LITERAL_private, LITERAL_public, LITERAL_protected);

    AST currentModifier = ASTFactory.create(LITERAL_private);

    // Loop through each of the CLASS_SPECIFIER ASTs, checking
    // each of its children to see if it has any access modifier
    // ("public", "private", or "protected").
    for (AST classSpecifier: root.getASTs(CLASS_SPECIFIER)) {
        // If there is any "friend" keyword in this class, we'll need
        // to just make everything "public".
        boolean hasFriend = classSpecifier.contains(LITERAL_friend);
        if (hasFriend) {
            currentModifier = ASTFactory.create(LITERAL_public);
        }
        for (AST child: classSpecifier.getChildren()) {
            AST foundModifier = child.findAny(ALL_C_MODIFIERS);
            if (foundModifier != AST.NOT_FOUND) {
                // Once we've found an access modifier, that's the one
                // we'll use until we find another (unless we have a friend,
                // in which case leave it "public".
                if (!hasFriend) {
                    currentModifier = foundModifier;
                }
            } else {
                // This child does not yet have an access modifier,
                // as required by Java. Insert a COPY of the last modifier 
                // that we encountered, as the first child
                child.insert(0, currentModifier);
            }
        }
    }

>From that code, we see the following features being used:    
    ASTFactory class:
        create() static method that creates a single AST

    AST class:
        getASTs() method which gets a List of ancestor ASTs of the given type
        contains() method which returns true iff there is any ancestor AST of the given type
        getChildren() method returns the direct children of the AST
        findAny() method returns any one ancestor AST that has one of the given types
        NOT_FOUND final static AST flag that indicates no AST was found 
        insert() method that inserts a copy of the given AST as a child.

Obviously, I haven't shown a "real term-rewrite system", I've just shown some
Java code with an AST class that has some reasonable tree-data-structure functionality.
For any term-rewrite-system to be useful to me, it's got to be easier to
develop and maintain that what I show here. I'm doubtful that that can be done,
but I'd love to be proven wrong.

My point here is not to start a long "Which is better for AST transformation:
a DSL or plain Java code?" debate (although I'd be happy to have that
debate, too). Instead, what I want to know is:
Is it worth my time to write up perhaps a couple hundred more examples of
this sort of Java code that does tree transformations? Would you guys 
who are buildinga currently-unnamed-rewrite-system actually be willing 
to go through such "requirements" and make sure your system provides 
that functionality?

The reason I ask is that a long time ago when I did look into the various tools 
(SF+SDF, Stratego, TXL, and later Tom), it seemed to me that they got to
be a real mess to do anything non-trivial. Maybe I'm wrong, but I think
if you implemented the same logic shown above in any of these systems,
it would be harder to write and maintain than the "vanilla Java" shown above.

Andy