[antlr-interest] Re: trees with payloads??

Thu Nov 11 19:25:40 PST 2004

--- In antlr-interest at yahoogroups.com, "John D. Mitchell"
<johnm-antlr at n...> wrote:
> >>>>> "lgcraymer" == lgcraymer  <lgc at m...> writes:

> > However, I also have strong objections to adding direct support for
> > either doubly-linked AST nodes or using arrays for building syntax
trees.
> > Doubly-linked ASTs are a red herring, and arrays are memory hogs.
> 
> Philosophically, I think a key point about Antlr v3 is that true,
powerful
> simplicity is driving things.
> 
> So, from that standpoint, what matters to me is that the base model of
> Antlr's AST support fundamentally provides the necessary and sufficient
> building blocks of tree identity, construction, traversing,
searching, and
> manipulation.

We are agreed there.  However, my concerns are that there are a number
of feeping creaturitis requests being made without any understanding
of the utility (or lack thereof, to be more precise), the performance
costs, or the available application-level solutions.

> In terms of performance, there's always the tension between speed and
> space.  If the AST model is fundamentally powerful enough than I'm quite
> fine with having the different code generators / runtime's choose
however
> they want to actually implement that model.

As a point of abstract philosophy, I agree whole-heartedly.  I'll even
agree that there are cases where arrays of pointers are useful and
that there might be ones where doubly-linked lists are useful
(although I can't think of any--it is my experience that there are
better alternatives for tree navigation).  If the ANTLR 3 interface
model allows these alternatives, that is ok by me.  However, both are
inferior in terms of both speed and memory usage to the current AST
implementation model.

> [...]
> > Arrays, besides being memory hogs, do not address the root cause
of the
> > cyclic reference problem.  The cyclic reference problem is due to the
> > current brute force tree construction algorithm; a better approach
is to
> > build a stream of instruction/payload pairs and then construct the
tree
> > after all instructions have been gathered.  
> 
> Yes, the cyclic reference problem is crucial to deal with cleanly. 
Antlr's
> existing model of tree (re-)construction is brittle.  A linguistic
approach
> (create stream of instructions, sort & unique'ify, and apply) is
both more
> powerful and simpler.
> 
> > This approach allows optimized tree construction with no unnecessary
> > object creation and deletion.
> 
> Yes and no.  You still have to e.g., create the objects for the
stream of
> instructions and it takes time and space to sort them, etc.  Of course,
> they are small and well contained temporally and locality-wise and
so can
> be aggressively recycled.

Sorting only becomes an issue if you shift to another implementation
model for AST nodes (like arrays for child lists).  And the stream of
instructions can be implemented as arrays of objects that contain two
values--an integer instruction and an AST/Token reference.  You would
need one array for each level of rule nesting; for most grammars, that
would mean about 10-20 arrays.

> > Getting back to payloads: one of the big advantages is that
payloads can
> > be used throughout the processing passes, starting with tokens.  By
> > separating navigation information (sibling/child references and type)
> > from attribute information (text descriptor, line/column
information, and
> > whatever else might be desired), there is no a priori need to copy
> > attribute information from one processing phase to the next. 
Instead, a
> > creation cost is only incurred when new attributes need to be added.
> > [And maybe not then, if one of the attributes is a Hashmap to support
> > Micheal's suggestion.  Hashmap usage isn't free, however: custom
payloads
> > would be preferable for both speed and minimized memory usage.]
> 
> I think you're talking about my demand, ahem :-), that there be a
standard
> slot for a set of dynamic attributes.  The actual set would only be
created
> if needed so the only required overhead is the reference/pointer.

We part company here.  I can see that as part of an AST library
facility (use this if you want to), but not as a default.  It adds a
library requirement for Hashmap which complicates porting.  On the
other hand, if that requirement were added, I'd be willing to see
syntactic support for symbol table manipulation in ANTLR--that would
be a relatively small next step.

> Yes, the notion of payload fits what Ter's already doing with e.g., not
> copying the data out of the underlying data buffer.  AST's will refer to
> Tokens.
> 
> > One other note: with payloads, there is no reason for Tokens to differ
> > from ASTs, except that an AST may have down and right fields set.
> 
> Interesting.   I haven't thought through all of the details but it's
> certainly conceivable.

Not much to think through.  It shifts the cost of AST allocation from
the parser to the lexer, cuts down on data manipulation in multi-pass
translators, and Carrier recycling could be fairly efficient.

--Loring

> Take care,
> 	John

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/