[antlr-interest] C runtime Memory Usage

Jim Idle jimi at temporal-wave.com
Sat Jan 24 16:01:47 PST 2009


Gavin Lambert wrote:
> At 06:47 25/01/2009, Jim Idle wrote:
> >It is a convenience function that creates (and tracks) a memory
> >structure that copies the string from the input every time you
> >call getText(). It is this way because (and I think this is in
> >the docs) sometimes you want a new copy of the input string
> >and sometimes you want to modify the string you get back but
> >not modify the input.
>
> I don't think that's a good argument.
You're entitled to your opinion, but believe me, I have written lots of 
parsers with this runtime and this is the best way. Because you can pin 
it yourself by setting the text pointer in the token, it is not really 
an inconvenience unless you don't read the examples or documentation.

> Strings in Java and C# are immutable; in C/C++ they're not, but they 
> should be treated as if they were 
err, not really. That's a completely arbitrary decisions that you just 
made up.
> (reading a string is a far more common operation than modifying one)
But, my code has to take care of the fact that people will and DO do 
this, and then they will wonder why the next time they ask for the 
string for that token, they got the last change that they made. In fact 
I would hazard a guess that if in fact you reference the $text, you are 
much more likely to want to change it than you would be in an ordinary C 
program.

> .  So the getText() function should return a const string and should 
> only construct it once.  (For performance reasons, it should probably 
> keep the existing behaviour of not constructing the string until first 
> requested.)
Tried that - it just filled my inbox with questions, whereas this way, 
it is the first time anyone has asked me about it.
>
> For another argument: it's trivial to take a read-only string and 
> convert it into a writable one (without affecting the original).  

> It's non-trivial to extract text from the token stream.  
It IS trivial to take text from the token stream - it is just pointers.
> So the latter function should be implemented by the runtime in such a 
> way that the former can be applied afterwards in the unlikely event 
> that it's needed.
Well, that;s the way I wrote it and that's the way it is staying. It is 
documented in the docs and the examples and if I could get anyone to 
read these, it would not be an issue :-) I could make it store the 
string reference, but then if someone changes it, they will need to 
reset it and so on. I went through this thought process a bunch of times.

For the record then:

$x.text always provides a NEW copy of the input stream, which you can 
then manipulate such as deleting characters and so on.
If you want to get the same instance every time, then store it 
somewhere, or set it in the token because if you set it back in the 
token, you will get that same instance back.
Remember that these methods are really used only be the runtime but it 
seemed sensible to let users use them for trivial things. They are not 
meant to be used all the time for interpreters and so on; as with 
everything else in C you should create your own system, optimized to do 
just what you want. If you don't need the speed, then use Java.

Jim


More information about the antlr-interest mailing list