[antlr-interest] Java code generator memory optimization

Akhilesh Mritunjai virtualaspirin at yahoo.com
Sat Sep 24 14:46:33 PDT 2005


Yep, it can if the program is a long running one and
assigns a lot of different identifiers to a node (and
thus intern'ing them).

However, in a typical compiler (like in my case), a
source program is taken in which there are some
identifier declarations and a LOT of identifier
references. The statistical analysis done by me tell
that in case of a detailed homogenous AST, the leaf
nodes representing identifiers and constants comprise
of upto 40% of total nodes.

The lexer creates a new string for each identifier
token from char stream, essentially creating several
string objects with identical strings. In case of
lexer, even the char array in strings is not shared.
These string objects and char arrays in them account
for upto 40% of total memory used by program.

So I refine my argument with inputs given, and say
that  unless a lexer is being used as part of a long
running program, running on a JVM that somehow doesn't
yield intern'ed strings to GC, the lexer should create
intern'ed token text strings.

What does everybody think ?

PS: It never occurred to me that intern'ed strings are
never GC'ed. There is no excuse for that kind of poor
implementation today when Java provides support for
weak references since 1.2. But if it is true, lexer
can do with a string pool rather than creating
millions of identical string objects.

- Akhilesh

--- Martin Probst <mail at martin-probst.com> wrote:

> Hi,
> 
> > If your program lives a long time (like on a
> server), it's very very bad, as
> > strings that are interned are never GC'd. 
> 
> And it gets even worse if you just take it from a
> String which is a
> substring. E.g. longString.substring(5,10).intern()
> will in some VMs
> make the whole character array from the long string
> intern'd, e.g.
> unavailable to GC. The only thing you should do is
> new
> String(longstring.substring(...)).intern(), as this
> forces a copy of the
> character array.
> 
> Martin
> 
> 



		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


More information about the antlr-interest mailing list