[antlr-interest] Re: trees with payloads??

micheal_jor open.zone at virgin.net
Thu Nov 11 08:37:52 PST 2004



--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> 
> > ArrayLists-type objects on the other hand don't have to waste space.
> > They become one pointer to the ArrayList which holds a size field and
> > an array with [approx] one pointer per child. The vtable is per class
> > not per instance.
> 
> Each object has a pointer to the vtable.

Didn't I say that?. Oh, I see I didn't.

> > For leaf nodes (where right == down == null) - one pointer is saved
> > since only one is needed for the ArrayList pointer (which would be
> > null too).
> > 
> > For internal non-leaf nodes - if the ArrayList is of the exact size
> > needed, only the ArrayList's size field is extra.
> 
> Not true--there are hidden fields:  pointer to vtable (vtable
> reference), size of allocated memory (larger than the object) or
> pointer to following memory, and size of object.  There may also be a
> reference counter or other fields for garbage collection (depends on
> the GC algorithm).

Per-object space overhead is typically about 8-16 bytes (C++ may be
lower). To that we add the two size fields and a pointer to the
buffer/array - another 12 bytes. About 5-6 pointers in all.

It seems the sweet spot for arrays are nodes with about 6+ edge-node
children.

> > Aren't there always more leaf nodes?
> 
> Yes, but not all leaf nodes have only null pointers (I'll call them
> edge nodes).  The edge nodes are a fraction of the total.

If they are a small enough fraction, arrays would likely use a little
more memory per node on average. Don't the pluses - iteration, direct
N-th child access etc - outweigh this cost?

> > >  This is offset by a savings
> > >  for terminal (right == down == null) nodes, but that does not help
> > > much with memory usage.  Arrays are memory hogs.
> > 
> > Fixed-length ones would usually be. Not convinced this holds for
> > ArrayLists.
> 
> I should also point out that memory is allocated in "chunks" to avoid
> fragmentation.

<SNIP>

> 64 or 128 bytes is not unusual
> for C/C++; Java and C# would be somewhat similar.

Not so important for GC'ed languages.

> > I can grok Token objects with Carrier objects as payload. Carrier is
> > passed unchanged to AST objects.
> 
> Conceptual block!  AST = Token = Carrier with Payload.

Mistyping block ;-). I meant Token with payload, payload pointer
copied to AST.

> > But Token objects and AST objects replaced by Carrier objects?. There
> > isn't a one-to-one match between Token objects and AST objects in many
> >  cases. How does the unified Carrier concept deal with this?
> 
> Not all Tokens become ASTs, and some ASTs are user-created.  However,
> all non-user-created ASTs have corresponding Tokens.  You gain
> performance efficiency by using Carriers with Payloads for Tokens and
> then reusing them as ASTs.

Aha!. Hey, I just had an "Aha!" moment. Coupled with an efficient
recyling scheme, this might just do away with Token objects as I
wanted (assuming there is a need to build an AST). The only downside
is that the AST is forced to be homogenous. That is a biggie - I'm
uneasy about this because heterogenous nodes still lets me party
[neatly] with those pesky Visitors.

Incidentally, I see some payloads being ignored/lost since not all
Token==>AST. Seems they could do with recycling too.

> True, but a 100KB or even 100 MB memory-mapped file is not a big
> expense--the OS virtual memory management takes care of that.  If you
> only keep a few strings, then you only keep a few blocks of the file
> memory-resident.  On the other hand, 20,000 string copies and creates
> (5 chars per word, which is probably an overestimate) is a performance
> hit.

Well, most systems have rather more than one file and, the original
strings are likely to be spread out all over the files (more VM pages
and paging). Keeping then around memory-mapped is a massive overhead
even with ultra-efficient OS VM management. Copying unique strings to
an in-memory buffer (and discarding the files) is likely to be more
performant.

Cheers,

Micheal
ANTLR/C#





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list