[antlr-interest] Interesting trick for forcing position of child nodes in ASTs.

Sat Jun 30 08:01:10 PDT 2007

> List<AST> getChildrenOfType(int type) {
>    List<AST> matches = new ArrayList<AST>();
>    for (AST child: children) {
 >      if (child.getType() == type) {
 >        matches.add(child);
 >     }
 >   }
 >   return matches;
>}

But that assumes that all the children are of the same type. In a heterogenous AST, in a real grammar, an AST nodes that match an "expression" rule, for example, could be any 40 or 50 different node types (either the AST type or the C#/Java type of the heterogenous node). There are at least half a dozen rules that will have AST nodes of very diverse types (global_statement, statement, expression, typereference, &c). And it's not at all clear to me at this point, that I can unambiguously determine the grouping of child nodes based solely on type in a large realworld grammar.

And what about rules like this:

    class_declaration
        :  attributes* modifiers* 'partial'? 'class' ID type_argument_list? base_classes? type_constraints? class_body
            -> ^(CLASS_DECLARATION attributes* modifiers* 
                    'partial'? ID type_argument_list? base_classes? 
                    type_constraints? class_body)
        ;

where 'partial' is a raw token?

You might be able to write:
        bool partialClass = (getChildrenOfType(typeof(TokenNode))[0].Text == "partial");

(what *is* the unambigous type of a token?) 

or you could modify the rule as follows:

    class_declaration
        :  attributes* modifiers* 'partial'? 'class' ID type_argument_list? base_classes? type_constraints? class_body
            -> ^(CLASS_DECLARATION 
                        ^(LIST attributes*)
                        ^(LIST modifiers*)
                        ^(OPTIONAL 'partial'?)
                        ID 
                        ^(OPTIONAL type_argument_list?) 
                        ^(OPTIONAL base_classes?)
                        ^(OPTIONAL type_constraints?)
                         class_body)

and write

        bool partialClass = IsOptionalChildPresent(3);

Although it is possible in general to generate artificial superclasses for groups nodes of nodes that bear functional resemblance to each other ("ExpressionNode", "TypeNameNode", "StatementNode", etc), it's also not clear to me that I can generate an unambiguos inheritance structure in single-inheritance languages like C# or Java. 

Certainly, there are many cases where I would have to refactor the grammar to allow the introduction of artifial nodes to provide unique classes. e.g.:
        local_variable_declarator
            : ID '=' initializer 
            | ID
            ;

        initializer:
            :    array_initializer
            |    expression  ->  ^(EXPRESSION_INITIALIZER expression)  // generates a subtype of InitiializerNode
            ;

It also assumes that the overhead of getChildrenOfType is preferrable to overhead at AST construction time. I'm not sure whether performance will be an issue yet or not, but it seems preferrable to take a minor performance hit at tree construction time than to take a multiple peformance hits at code generation time.

The main problem with NOT using structural nodes is that you need to think very carefully about how to differentiate nodes in an AST. And having to thinking carefully is a bad thing, from a programming point of view.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070630/d368ba74/attachment-0001.html