[antlr-interest] Re: trees with payloads??

Wed Nov 10 15:23:06 PST 2004

[If I'm lucky, John will appreciate the content of this enough to keep
him from making a "Netiquette nanny" statement--this is a bit long.]

Actually, performance is part of the reason for payloads--payloads can
be carried from pass to pass without object creation, and--even in
Java--it may be possible to optimize memory management of the
"carrier" objects.  However, I also have strong objections to adding
direct support for either doubly-linked AST nodes or using arrays for
building syntax trees.  Doubly-linked ASTs are a red herring, and
arrays are memory hogs.

If you really need doubly-linked nodes in an ANTLR application, then
the following is almost all of the machinery needed:

Class LinkedAST extends CommonAST {
     LinkedAST back = null;

     public void link(LinkedAST ast) {
         back = ast;
     }

     public LinkedAST prev() {
         return back;
     }

     public boolean hasParent() {
         return back.getFirstChild() == this;
     }

     public boolean hasSibling() {
         return back.getNextSibling() == this;
     }
}

Class ASTLinker {
     LinkedAST current = null;

     public void visit(LinkedAST ast) {
         current = ast;

         if (ast.getFirstChild() != null)
               visitChild();

         if (ast.getNextSibling() != null)
               visitSibling();
      }

     public void visitChild() {
          LinkedAST child = current.getFirstChild();
          LinkedAST context = current;
          child.link(context);
          current = child;
          if (current.getFirstChild() != null)
                visitChild();

          if (current.getNextSibling() != null)
                visitSibling();

          current = context;
      }

      public void visitSibling() {
      // similar to visitChild()
      }
}

ASTLinker is a Visitor which does the double linking; basically, you
would create an ASTLinker and then invoke visit() on the root of the
tree you wanted to doubly-link.

Arrays, besides being memory hogs, do not address the root cause of
the cyclic reference problem.  The cyclic reference problem is due to
the current brute force tree construction algorithm; a better approach
is to build a stream of instruction/payload pairs and then construct
the tree after all instructions have been gathered.  This approach
allows optimized tree construction with no unnecessary object creation
and deletion.

Getting back to payloads:  one of the big advantages is that payloads
can be used throughout the processing passes, starting with tokens. 
By separating navigation information (sibling/child references and
type) from attribute information (text descriptor, line/column
information, and whatever else might be desired), there is no a priori
need to copy attribute information from one processing phase to the
next.  Instead, a creation cost is only incurred when new attributes
need to be added.  [And maybe not then, if one of the attributes is a
Hashmap to support Micheal's suggestion.  Hashmap usage isn't free,
however:  custom payloads would be preferable for both speed and
minimized memory usage.]

One other note:  with payloads, there is no reason for Tokens to
differ from ASTs, except that an AST may have down and right fields
set.  That is:

Class Carrier {
    Payload attr;
    int type = 0;
    Carrier down = null;
    Carrier right = null;

    public Carrier(int type0, Payload att) {
        type = type0;
        attr = att;
    }

    ... // all of the usual AST ops, plus get/set attr
}

Both ASTs and Tokens are then replaced by Carriers (probably with a
better name).  This would have a performance advantage for tree
construction in the Parser since the token Carriers can be linked to
form a syntax tree in the parser.

--Loring

--- In antlr-interest at yahoogroups.com, "John D. Mitchell"
<johnm-antlr at n...> wrote:
> >>>>> "Terence" == Terence Parr <parrt at c...> writes:
> [...]
> 
> > After Loring bitched at me on the phone yesterday <snicker>, I'm
> > beginning to think he's right: if separating the node data from the
> > navigation is the right concept, then trees should be a single
> > implementation that simply carry a "payload" object defined by the
user.
> > This is like Sun's MutableTreeNode.
> 
> Go Loring!  :-? :-)
> 
> I strongly concur.
> 
> 
> [...]
> 
> > As a bonus to the payload strategy, we can enhance the tree
functionality
> > later w/o forcing alterations in people's application; their payload
> > objects still fit in our nodes.
> 
> Yea!
> 
> > This all comes at the cost of an additional object creation (payload
> > creation + node creation).
> 
> In Java, who cares.
> 
> For people who care (embedded C), the code gen could be made smart
enough
> to do just one single allocation for both "objects".
> 
> 
> > Side note: Mitchell suggested Tokens and Tree nodes should have
not only
> > fixed fields like this, but the ability to dynamically acquire
> > "attributes"; this would basically be a hashmap.  It cuts down on a
> > bazillion subclasses.  It would be useful when parsing xml for
example.
> > The TAG token could have a list of tag attributes w/o creating
subclasses
> > etc...
> 
> Yeah, that's a damn good idea!  :-?
> 
> LinkedHashSet is your friend.
> 
> Have fun,
> 	John

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/