[antlr-interest] Re: trees with payloads??

Thu Nov 11 09:48:34 PST 2004

>>>>> "lgcraymer" == lgcraymer  <lgc at mail1.jpl.nasa.gov> writes:
[...]

> Actually, performance is part of the reason for payloads--payloads can be
> carried from pass to pass without object creation, and--even in Java--it
> may be possible to optimize memory management of the "carrier" objects.

Indeed.  

[I didn't bring this up with Ter because his brain was already overflowing
with all of the lexer stuff. :-)]

> However, I also have strong objections to adding direct support for
> either doubly-linked AST nodes or using arrays for building syntax trees.
> Doubly-linked ASTs are a red herring, and arrays are memory hogs.

Philosophically, I think a key point about Antlr v3 is that true, powerful
simplicity is driving things.

So, from that standpoint, what matters to me is that the base model of
Antlr's AST support fundamentally provides the necessary and sufficient
building blocks of tree identity, construction, traversing, searching, and
manipulation.

In terms of performance, there's always the tension between speed and
space.  If the AST model is fundamentally powerful enough than I'm quite
fine with having the different code generators / runtime's choose however
they want to actually implement that model.

[...]
> Arrays, besides being memory hogs, do not address the root cause of the
> cyclic reference problem.  The cyclic reference problem is due to the
> current brute force tree construction algorithm; a better approach is to
> build a stream of instruction/payload pairs and then construct the tree
> after all instructions have been gathered.  

Yes, the cyclic reference problem is crucial to deal with cleanly.  Antlr's
existing model of tree (re-)construction is brittle.  A linguistic approach
(create stream of instructions, sort & unique'ify, and apply) is both more
powerful and simpler.

> This approach allows optimized tree construction with no unnecessary
> object creation and deletion.

Yes and no.  You still have to e.g., create the objects for the stream of
instructions and it takes time and space to sort them, etc.  Of course,
they are small and well contained temporally and locality-wise and so can
be aggressively recycled.

> Getting back to payloads: one of the big advantages is that payloads can
> be used throughout the processing passes, starting with tokens.  By
> separating navigation information (sibling/child references and type)
> from attribute information (text descriptor, line/column information, and
> whatever else might be desired), there is no a priori need to copy
> attribute information from one processing phase to the next.  Instead, a
> creation cost is only incurred when new attributes need to be added.
> [And maybe not then, if one of the attributes is a Hashmap to support
> Micheal's suggestion.  Hashmap usage isn't free, however: custom payloads
> would be preferable for both speed and minimized memory usage.]

I think you're talking about my demand, ahem :-), that there be a standard
slot for a set of dynamic attributes.  The actual set would only be created
if needed so the only required overhead is the reference/pointer.

Yes, the notion of payload fits what Ter's already doing with e.g., not
copying the data out of the underlying data buffer.  AST's will refer to
Tokens.

> One other note: with payloads, there is no reason for Tokens to differ
> from ASTs, except that an AST may have down and right fields set.

Interesting.   I haven't thought through all of the details but it's
certainly conceivable.

Take care,
	John

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/