[antlr-interest] Must all grammar rules be implemented for tree parsers?

Bryan Ewbank ewbank at gmail.com
Tue Dec 6 02:06:36 PST 2005


Hi Rob,

You asked, in part...

On 12/5/05, Rob Greene <robgreene at gmail.com> wrote:
> 1) From my experimenting, I cannot just pick nodes that are
> optimizable, can I? From what I'm seeing, I think I need to build
> enough of the language into the tree parser to get to the optimizable
> parts. As a simple example, numeric constants that are operated on
> together can be computed by the compiler instead of the resulting code
> (ie, "a = 24 * 60" can be replaced with "a = 1440"). Am I correct in
> this? I cannot just write a rule for the "*" in that example. I need
> to write the rule for the "=" and anything else preceding it?

Well, yes and no.  If you're careful, you can write a very simply tree
parser that wanders through the tree and hits only those nodes that
are "interesting"...

For example:

    program : #(PROGRAM ( expr )* ) ;

    expr
    { bool sawmatch = false; }
    :
        (
            // candidates for optimization
            ( #( op:ASGN do_binary_optimization[#op] )
            | #( op:PLUS do_binary_optimization[#op] )
            )
            { sawmatch = true; }
        )?
        ( {sawmatch == false}?
            #( . ( expr )* )
        )?
    ;

    do_binary_optimization
    [ RefAST &parent ]
    :
        ... whatever ...
    ;

This will "stop" only at ASGN and PLUS, and will dash through the rest
of the tree.  Using the "sawmatch" flag takes care of an ANTLR warning
if you have "( A | B | C | . )", because "A" and "." both match "A". 
Ugly at first, but it's an idiom we use for this type of operation.

Note that, if you are careful, you /can/ do operations in place as you
ask.  It just requires that you operate on the existing node, rather
than create a new node...

    replace_plus :
        #( #op:PLUS e1:expr e2:expr )
            {
                if (e1->getType() == LITERAL_INTEGER && e2->getType()
== LITERAL_INTEGER)
                {
                    // assuming a stack
                    int v1 = top(); pop();
                    int v2 = top(); pop();
                    push(v2+v1);

                    // tree surgery.  can't use "#op = #[LITERAL_INTEGER]"
                    // because this would not preserve the relationship to the
                    // parent node, whatever that may be.
                    #op->setType(LITERAL_INTEGER);
                    #op->setText("");
                    #op->setIntegerValue(top());
                    #op->setFirstChild(NullAST);
                }
            }
        ;


More information about the antlr-interest mailing list