[antlr-interest] Re: trees with payloads??

Wed Nov 10 17:01:24 PST 2004

--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:

> Actually, performance is part of the reason for payloads--payloads can
> be carried from pass to pass without object creation, and--even in
> Java--it may be possible to optimize memory management of the
> "carrier" objects.  However, I also have strong objections to adding
> direct support for either doubly-linked AST nodes or using arrays for
> building syntax trees.  Doubly-linked ASTs are a red herring, and
> arrays are memory hogs.

I have my doubts about doubly-linked ASTs too. I don't understand
however how you formed the opinion that arrays are memory hogs. I
would have though they were otherwise. And, if performance is
important, they [might] offer superior locality properties which makes
for very efficient cache usage. I say might only because this also
depends on the memory location of the objects in the array. 

> Arrays, besides being memory hogs, do not address the root cause of
> the cyclic reference problem.  The cyclic reference problem is due to
> the current brute force tree construction algorithm; a better approach
> is to build a stream of instruction/payload pairs and then construct
> the tree after all instructions have been gathered.  This approach
> allows optimized tree construction with no unnecessary object creation
> and deletion.

This is a sorely needed improvement. And getting rid of the cost of
classes like ASTPair (easier in C#/C/C++ than Java I'd admit).

> Getting back to payloads:  one of the big advantages is that payloads
> can be used throughout the processing passes, starting with tokens. 
> By separating navigation information (sibling/child references and
> type) from attribute information (text descriptor, line/column
> information, and whatever else might be desired), there is no a priori
> need to copy attribute information from one processing phase to the
> next.  Instead, a creation cost is only incurred when new attributes
> need to be added.  [And maybe not then, if one of the attributes is a
> Hashmap to support Micheal's suggestion.  Hashmap usage isn't free,
> however:  custom payloads would be preferable for both speed and
> minimized memory usage.]
> 
> One other note:  with payloads, there is no reason for Tokens to
> differ from ASTs, except that an AST may have down and right fields
> set.  That is:
> 
> Class Carrier {
>     Payload attr;
>     int type = 0;
>     Carrier down = null;
>     Carrier right = null;
> 
>     public Carrier(int type0, Payload att) {
>         type = type0;
>         attr = att;
>     }
> 
>     ... // all of the usual AST ops, plus get/set attr
> }
> 
> Both ASTs and Tokens are then replaced by Carriers (probably with a
> better name).  This would have a performance advantage for tree
> construction in the Parser since the token Carriers can be linked to
> form a syntax tree in the parser.

Hmm, I am more inclined to eliminate [the cost of] Token objects
[almost] entirely. I say almost because a mechanism is still needed to
transfer attributes like lexeme/line/col info to AST node objects.

The TokenStream interface in it's current form might not be the [best]
way to go. Unless Token objects can be recycled efficiently.

So how can lexical tokens be modelled as an integer value (or enum
value) usually and yet parsers are able to extract additional info
such as lexeme/line/col on-demand and, inexpensively?

The concept of a Carrier objects could play a part but I don't know
how 'cause I haven't found an answer to my conundrum yet  ;-)

Cheers,

Micheal
ANTLR/C#

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/