[antlr-interest] nodes, hidden tokens, garbage collection

Fri Apr 5 02:09:46 PST 2002

Hi,

On Thu, Apr 04, 2002 at 11:58:40AM -0800, John Allen Green wrote:
> Regarding Boehm GC: From what I've read - yes, it should handle any
> circular structures fine. I think a bigger question about Boehm GC is: is
> it as portable as Antlr/C++ currently is?

Boehm is pretty portable. At least to common systems. Solaris and many i386
flavoured unixes. Dunno if there's a win32 port.

> I think that for Antlr's purposes, a memory pool makes more sense than a
> full blown GC anyway. (Er, as long as it is implemented allowing Antlr
> users to easily create multiple memory pools - one for each tree required
> in memory.) With GC, there's the overhead of scanning the memory to find
> unreachable objects. With a memory pool - well - when you're done, you're
> done, right?

100% agree. Boehm is nice if you already use it in you app though.

> It's been a while since I read about customizing new/delete in C++. I hope
> you're not talking about a global replacement for new/delete, but just
> building custom new/delete specifically for the objects which should be
> stored in memory pools...? I think I read somewhere that globally replacing
> new/delete is "evil".

No was talking about the specific objects. The global one is usually only
necessary for memory debugging tools and tricks.

> Is there an online link to that book's memory pool implementation, or do I
> have to make a trip to the bookstore? :-)

I diligently typed it in from the book. No online version afaik. I'll send
it to you. On note of interest there's a slightly more advanced memory pool
in the Loki library from the Modern C++ Design book from Andrei
Alexandrescu, the source should be at:
(http://www.awl.com/cseng/titles/0-201-70431-5/) this book really rocks.
That one also support multithreading etc.

> > At least RefAST would dissappear it's causing most 'problems' (or kludges)
> > and indeed it would become a plain old pointer. RefToken is less
> > troublesome (no heterogenous tokens and stuff like that).
> 
> If the overhead of a garbage collector is added anyway... it might not make
> sense to have the overhead of ref counting for anything. I'd like to deal
> with RefToken as well, at least eventually, because of the problems with
> hiddenTokens.

It may in some point depend on how many objects are created and the exact
scheme of garbage collection (if any) you use. For the tokens garbage
collection would be quite prudent :), for the AST's you might drop garbage
collection (depending on what you are parsing)

> > Can't you use your own flavour of hidden tokens? I just had a small look
> > at the current implementation and I think it can be changed a little to
> > solve the reference counting problem. (basically play dirty in a few
> > places and store pointers to the objects in stead of using the RefToken) 
> 
> Wouldn't be too tough, I don't think. I'd have to hack or replace
> CommonHiddenStreamToken, as well as CommonHiddenTokenFilter, which refers
> to CommonHiddenStreamToken. 

Why hack/replace just use your own ;)

> Storing dumb pointers instead of storing RefTokens is asking for trouble
> down the road though, I think. For one, you would quickly find that you
> need to pass RefToken to various functions, and a dumb pointer does you no
> good.

It depends a little for the preserveWhiteSpace example you might get away
with it. Real app probably more difficult indeed.

> > Or do it less dirty and use a single linked list of visible tokens and in
> > the visible tokens you store a vector (or single linked list) with all
> > hidden tokens before/after the visible one. I don't see why a double
> > linked list is necessary there (probably there because of direct port
> > from java).
> 
> I thought about using a vector of backlinks instead of a double-linked AST
> structure. However, in designing it, I found that there was a fair bit less
> data overhead involved in just putting the backlinks directly into the AST.

Hmm if you know how many backlinks you have you may try a dirty trick with
a overallocated class with an array as last attribute. I used that memory
pool to do that old C trick in C++.

> http://agora.cubik.org/wiki/view/Main/ReferenceCounting

Thanks!

> My workaround: Don't drop/dereference (what do you call it?!) any nodes. If
> a node is to be removed from the tree, move it to -another- tree, which
> I've called "looseEnds". Then, at cleanup time, clean up the main tree as
> well as the looseEnds tree. (Obviously I've already written functions for
> cleaning up back links in the AST and for cleaning up the hidden tokens
> when AST is deleted.)

Sounds good.

Cheers,

Ric
-- 
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- klaren at cs.utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
	  Innovation makes enemies of all those who prospered under the old
	regime, and only lukewarm support is forthcoming from those who would
					prosper under the new. --- Niccolò Machiavelli

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/