[antlr-interest] C runtime Memory Usage
Gavin Lambert
antlr at mirality.co.nz
Sat Jan 24 16:34:27 PST 2009
At 13:01 25/01/2009, Jim Idle wrote:
>> Strings in Java and C# are immutable; in C/C++ they're not,
but
>> they should be treated as if they were
>err, not really. That's a completely arbitrary decisions that
you
>just made up.
No, I didn't. In fact, most implementations of the C++ STL do
something similar with std::string (copy-on-write), so it's not
even unique to managed code.
>But, my code has to take care of the fact that people will and
DO
>do this, and then they will wonder why the next time they ask
for
>the string for that token, they got the last change that they
made.
>In fact I would hazard a guess that if in fact you reference the
>$text, you are much more likely to want to change it than you
>would be in an ordinary C program.
If they use $text = foo, then sure, they're trying to replace the
text for the token. But replace is not the same as
modify. Modifying the result of $text directly should be
forbidden, which is easy if you just make it const. And most of
the time this sort of thing is confined to the lexer/parser,
anyway. It'd be less common at the tree parser level.
Tokens already have to support the idea of drawing their text
either from the input stream (if they haven't been replaced as
above) or from arbitrary text set by embedded code in the
grammar. So all you would need to do is to set the text the first
time it is queried for. You get performance benefits both ways,
that way -- if they never ask, it never needs to query the token
stream and allocate the memory, and if they ask multiple times
then it only needs to do so once and doesn't waste additional
memory.
Or perhaps another approach would be to more closely model how
std::string works. When retrieving the text as an ANTLR string
and manipulating it with the ANTLR string manipulation functions,
it's writable but performs a copy-on-write as needed to preserve
referential integrity of other strings. When retrieving a raw
char* it only gives you a const one, to let you know that you
should be using the ANTLR functions to modify it instead.
This isn't hard to do (especially single-threaded) and shouldn't
impose much of a CPU performance penalty, and should improve
memory performance dramatically whenever the text is being
accessed. (And in any non-trivial program, the text is bound to
be accessed quite a bit.)
>It IS trivial to take text from the token stream - it is just
>pointers.
Which you have to iterate and assemble (for rule text,
anyway). So no, it's not trivial. (It might be *easy*, but not
trivial.)
More information about the antlr-interest
mailing list