[antlr-interest] Recursive Tree Walking C Target

Jim Idle jimi at temporal-wave.com
Fri Sep 10 10:36:20 PDT 2010



> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Kenneth Domino
> Sent: Friday, September 10, 2010 9:15 AM
> To: Thomas Davis; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Recursive Tree Walking C Target
> 
> > Just wondering if anyone had any tips for recursively walking an
> > ANTLR_BASE_TREE produced from a parser. I seem to be getting some
> > memory issues.
> 
> FYI, I transform the Antlr tree into my own C++ data structure for tree
> walking.

Not sure why you would need to do this, you are just adding an extra layer
and I don't see that you are getting much for your added complexity.

> E.g.,
> 
> #pragma once
> #include <vector>


> With this conversion, I can now do things more easily, because I don't
> use the Antlr C runtime data structures, which are hard for me to
> understand and debug.  (I still cannot understand why the target isn't
> just C++.)  
 
For a start, a C++ target will generally have more overhead as it isn't
quite as close to the metal. Secondly though, C++ compilers are not
universally available, whereas almost every platform has a C compiler.
Thirdly, many professional software companies do not allow C++ because not
enough people understand it properly and they end up with unfathomable,
uncommented, C++. Hence C is the basis of everything and in this case
deliberately so.

> I can now add an iterator for tree walking, or change the
> behavior of getText(), which allocates a new copy of the string
> everytime it is called.

Unless you install your own function, which is why all the structures use
pointers to functions. But as I have said many times, getText() is really
not meant for hard core work. I also explained to you that I can't know what
you want to do with the text, so if you call getText, you will get another
copy. If I don't do that, then you would manipulate what I give you and it
would become the text for the token as a byproduct of using it, which is not
what you want (generally). If you want to preserve the text, don't call
getText - cache it. There is even a pointer that you can use in the token
structure. If I made the default be what you want, then it would be
incorrect for most purposes. You also misunderstood the code as you were
looking at the code that decides if the lexer has overridden the default
text or not, but don't let that stop you commenting. Finally, if you are not
changing the text, then don't copy it at all, just use the pointer to the
input, which is stored in the token.

The C code is completely flexible, but it is raw C, aimed at being as fast
as it can be and does not come for free. I can't help thinking that you have
done a lot more work here than you would have done if you had read through
the docs or asked a few more questions even. 

> In addition, in my tree walker I need to associate associate some data
> with each node. I could create a std::map<pANTLR3_BASE_TREE, DATA *> but
> this was slow because of all the thousands of nodes.

Yes, because you are performing thousands of new(), another reason I did not
write this in C++. You say you don't understand why it isn't C++ but  in the
next breath, you immediately run across one of the problems of doing that in
C++ or trying to make the runtime be all encompassing for all purposes; it
deliberately isn't. Reading the comments, you would have seen that I thought
of all that and that is why there is a void * that you can use for anything
you like. 
 

> Alternatively, I could have tried to modify the default node type in
> tree construction, but I could not find an example to make my life
> easier, and I am not motivated enough to read and understand
> "newPoolTree (pANTLR3_ARBORETUM factory)" in antlr3commontree.c.

Well, if you are not going to read the code and comments and doxygen, you
won't see that there are fields in the default node that are specifically
reserved for holding data. They are also documented in the doxygen docs.

----
void *  u 
  Generic void pointer allows the grammar programmer to attach any structure
they like to a tree node, 
  in many cases saving the need to create their own tree and tree adaptors. 
---


I am happy that you have something working that fits your needs, but you say
that you don't understand the code, and you don't have the background
information for the WHYs (though that is in the comments and doxygen), and
you say you are not motivated enough (I hope your employer doesn't read this
list :-) to put the time in to read the docs... 

  ...so perhaps you should reserve your opinions as to why I put things
together the way I did instead of trying to lead others down the same path
as yourself? I don't mind you finding your own way through it, but...

Jim 




More information about the antlr-interest mailing list