[antlr-interest] Suggestion for List size in BaseTree

Martin Probst mail at martin-probst.com
Tue Dec 2 01:05:39 PST 2008


Hi all,

I have a suggestion for the java.util.List children in
org.antlr.runtime.tree.BaseTree.

It appears to be constructed with the default ArrayList constructor,
which will give you an empty list with space for 10 elements.

In my personal experience, for 99% of my language's statements this is
complete overkill: most have either two or three children, very few
ones have more. Constructing the tree with smaller array sizes might
have a noticeable impact on memory usage (and thus performance) of the
system.

A CommonTree currently has 3 int fields, plus a reference to parent, a
reference to token, and the reference to the children array, which in
turn has a reference to the array, a size and a modcount field. So for
each AST node we have 4x reference size plus 5x int size, e.g., on a
64 bit system about 52 bytes, plus the object and array overhead of
the JVM (something like 32 bytes?). So that's 84 bytes not counting
the contents of the array. This calculation is entirely done on a
rumours and phantasies basis, but I think it should be roughly giving
the right direction.

Now if the array is size == 3 by default, we have 24 bytes addition,
if it's size 10 by default, we have 80 bytes, giving 108 bytes against
168 bytes. If you're AST is entirely 3 children or less, you'll save
~60% of memory (not counting token objects).

Of course this is a difficult tradeoff, and it totally depends on the
application in case. Functional languages will probably get much
larger savings than procedural, where you commonly have many
statements as children of a function body. Still, all your typical
expressions (arithmetic, boolean, ...) will usually have two children,
most function calls have less than ten parameters, and so forth. Also,
the "global" performance characteristics depend on how expensive it is
to increase the array size (no clue about that), relative to the cost
of the memory.

So maybe there should be some way to change BaseTree's behaviour here?
Or a reasonable default that's significantly lower?

Martin


More information about the antlr-interest mailing list