[antlr-interest] Re: trees with payloads??
lgcraymer
lgc at mail1.jpl.nasa.gov
Wed Nov 10 15:23:06 PST 2004
[If I'm lucky, John will appreciate the content of this enough to keep
him from making a "Netiquette nanny" statement--this is a bit long.]
Actually, performance is part of the reason for payloads--payloads can
be carried from pass to pass without object creation, and--even in
Java--it may be possible to optimize memory management of the
"carrier" objects. However, I also have strong objections to adding
direct support for either doubly-linked AST nodes or using arrays for
building syntax trees. Doubly-linked ASTs are a red herring, and
arrays are memory hogs.
If you really need doubly-linked nodes in an ANTLR application, then
the following is almost all of the machinery needed:
Class LinkedAST extends CommonAST {
LinkedAST back = null;
public void link(LinkedAST ast) {
back = ast;
}
public LinkedAST prev() {
return back;
}
public boolean hasParent() {
return back.getFirstChild() == this;
}
public boolean hasSibling() {
return back.getNextSibling() == this;
}
}
Class ASTLinker {
LinkedAST current = null;
public void visit(LinkedAST ast) {
current = ast;
if (ast.getFirstChild() != null)
visitChild();
if (ast.getNextSibling() != null)
visitSibling();
}
public void visitChild() {
LinkedAST child = current.getFirstChild();
LinkedAST context = current;
child.link(context);
current = child;
if (current.getFirstChild() != null)
visitChild();
if (current.getNextSibling() != null)
visitSibling();
current = context;
}
public void visitSibling() {
// similar to visitChild()
}
}
ASTLinker is a Visitor which does the double linking; basically, you
would create an ASTLinker and then invoke visit() on the root of the
tree you wanted to doubly-link.
Arrays, besides being memory hogs, do not address the root cause of
the cyclic reference problem. The cyclic reference problem is due to
the current brute force tree construction algorithm; a better approach
is to build a stream of instruction/payload pairs and then construct
the tree after all instructions have been gathered. This approach
allows optimized tree construction with no unnecessary object creation
and deletion.
Getting back to payloads: one of the big advantages is that payloads
can be used throughout the processing passes, starting with tokens.
By separating navigation information (sibling/child references and
type) from attribute information (text descriptor, line/column
information, and whatever else might be desired), there is no a priori
need to copy attribute information from one processing phase to the
next. Instead, a creation cost is only incurred when new attributes
need to be added. [And maybe not then, if one of the attributes is a
Hashmap to support Micheal's suggestion. Hashmap usage isn't free,
however: custom payloads would be preferable for both speed and
minimized memory usage.]
One other note: with payloads, there is no reason for Tokens to
differ from ASTs, except that an AST may have down and right fields
set. That is:
Class Carrier {
Payload attr;
int type = 0;
Carrier down = null;
Carrier right = null;
public Carrier(int type0, Payload att) {
type = type0;
attr = att;
}
... // all of the usual AST ops, plus get/set attr
}
Both ASTs and Tokens are then replaced by Carriers (probably with a
better name). This would have a performance advantage for tree
construction in the Parser since the token Carriers can be linked to
form a syntax tree in the parser.
--Loring
--- In antlr-interest at yahoogroups.com, "John D. Mitchell"
<johnm-antlr at n...> wrote:
> >>>>> "Terence" == Terence Parr <parrt at c...> writes:
> [...]
>
> > After Loring bitched at me on the phone yesterday <snicker>, I'm
> > beginning to think he's right: if separating the node data from the
> > navigation is the right concept, then trees should be a single
> > implementation that simply carry a "payload" object defined by the
user.
> > This is like Sun's MutableTreeNode.
>
> Go Loring! :-? :-)
>
> I strongly concur.
>
>
> [...]
>
> > As a bonus to the payload strategy, we can enhance the tree
functionality
> > later w/o forcing alterations in people's application; their payload
> > objects still fit in our nodes.
>
> Yea!
>
> > This all comes at the cost of an additional object creation (payload
> > creation + node creation).
>
> In Java, who cares.
>
> For people who care (embedded C), the code gen could be made smart
enough
> to do just one single allocation for both "objects".
>
>
> > Side note: Mitchell suggested Tokens and Tree nodes should have
not only
> > fixed fields like this, but the ability to dynamically acquire
> > "attributes"; this would basically be a hashmap. It cuts down on a
> > bazillion subclasses. It would be useful when parsing xml for
example.
> > The TAG token could have a list of tag attributes w/o creating
subclasses
> > etc...
>
> Yeah, that's a damn good idea! :-?
>
> LinkedHashSet is your friend.
>
> Have fun,
> John
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list