[antlr-interest] advocacy of C++ support in ANTLR 3.x

Jim Idle jimi at temporal-wave.com
Wed Apr 2 10:23:58 PDT 2008


Tom,

 

This is the C version of the runtime, not the Java version. Until this week I have been very busy and not had a chance to look at C++ tree building. However, you CAN use it, but it will take a bit more ingenuity. 

 

The adaptor is an adaptor, or the default tree nodes would not work, but in C there is no class inheritance and I cannot protect functions/pointers that must be public, from being used anywhere. The Java version of the tree node also has addChild etc. It needs to because the tree adaptor needs to ask it to add children. However, if you don't call the treenode methods, then they won't be used directly in either C or Java or any other target.

 

If you want a different tree node and different adaptor, then you must do it the C way for the moment. This means creating a set of functions that have C linkage and providing a pointer to this interface as your adaptor. The internals will not call the node functions directly, and if they do, then it is merely a mistake that I will fix. At some point, i will provide some C++ classes to make this easier for C++ programmers. At that point, there will be inheritance and so on available, which I do not have for C. Hence, it isn't 'incorrect', it is just C.

 

So, you need two C++ classes, one for an adaptor one for a tree node. The internals don't matter, but within them you need to be able to provide a pointer to structure, within which there are pointers to the 'method' set, which the generated code will call and not care what you do with the call. There is also a pointer to a 'super' class (which is void and so can point to the object if you want, though there is another user pointer that can be for that). The tree adaptor uses the interface pANTLR3_BASE_TREE to play with nodes, the tree grammar uses the interface pANTLR3_BASE_TREE_ADAPTOR. As there is no inheritance and interfaces, I cannot simply make the pointers all void and cast as all that happens is that it becomes a pointer to a function anyway, so then you would have to build your own set of pointers to functions and basically reproduce all the work I already did for you. So in C you define your own structure, and within that hold a pointer or embedded the base structure, which your methods return a pointer to. What exactly contains this structure doesn't matter, hence it is an interface.

 

Is it more of a pain than C++ inheritance? Yes, it certainly is; it is C. Will I provide something better for C++ programmers? Yes, but I don't have it right now, so if you need to do something in C++ right now, then it will be a bit more work for you I am afraid. I am sorry about that, but this is all free work and I have asked for examples of what C++ programmers find difficult with the C interface, which I will provide answers for in my own time. 

 

However, your conjecture that the current design isn't 'correct' doesn't make any sense as it is a C design, not a class based design - this is as close as you can get in C and under the covers isn't that far away from what the C++ compilers do anyway (within the limits of not being able to include assembler).

 

Jim

 

 

From: Tomas Potrusil [mailto:potrto at centrum.cz] 
Sent: Wednesday, April 02, 2008 1:11 AM
To: ANTLR
Cc: Jim Idle; Terence Parr
Subject: RE: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

What was wrong on my ideas bellow? I know that with the current design I need a new tree adaptor and to "override" the ANTRL_BASE_TREE which means that I must include the ANTLR_BASE_TREE structure (or better, MY_TREE structure which will contain COMMON_TREE which contains ANTLR_BASE_TREE structure) inside my AST classes. This is completely clear and it will probably work without any problems.

 

All I wanted to point out was that the current design is not very correct (unlike Java version).

 

One question: why should I (or probably you - you are the parser generator) use such an adaptor when you can call some functions (like addChild) directly through ANTLR_BASE_TREE interface? You hold pointer to that interface. But you give up to do this and call the "adaptor" (function with the same name - addChild) which in turn calls the function directly on the tree. There are few such functions, right? Why they are there? It doesn't make sense.

 

Java version (its documentation) makes this clear. These functions are in the TreeAdaptor interface also, but they have slightly different semantic meaning: their attributes are Objects not direct interfaces. Of course, the adaptor should hide the real tree that is used behind the scene! Users can use Tree interface (this does the BaseTreeAdaptor), but can use completely different tree classes they like.

 

Here is a correspondence in this list:

> Why does the method create(Token) return an Object? I'm curious why an

> Object and not a Tree. When you manipulate trees, it seems to cause

> quite a bit of (useless?) casts everywhere...

this is because you might want tree nodes that do not implement ANTLR's 

tree interface. When I made a TreeAdaptor to output XML (DOM) I was 

quite happy that I could work with Objects :)

 

There is one another feature that is hidden in the adaptors. They are object factories. They create new trees so that the runtime doesn't need to know the real type of the tree. In Java this is particularly funny because these "factory methods" (create()) are adaptors as well and so they return Object. Nevertheless Terence Parr knows this: "Rather than have a separate factory and adaptor, I've merged them."

 

In C version these "factory methods" returns pANTLR3_BASE_TREE. This is correct, no problem.

 

To sum it up - the ANTLR3_BASE_TREE_ADAPTOR is an object factory, but it is NOT a tree ADAPTOR. It's a pity!

 

Why I'm writing all this garbage? Because the idea of tree adaptors is wonderful and I cannot use it. Adaptors would make my implementation much easier. I would just create a new adaptor that would work with our existing C++ AST classes and that's it! Right now I must include a "MY_TREE" structure in our classes that just recalls ANTLR3_BASE_TREE functions to our methods. This is an overhead that is not necessary.

 

Tom

 

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Tuesday, April 01, 2008 6:54 PM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

Please read the comments in the source for common tree adaptor and base tree adaptor before attempting this, as well as http://www.antlr.org/api/C/index.html. 

 

In the C version , all adaptors and so on should return a pointer to pANTLR_BASE_TREE, which should be contained within your own tree nodes (which can contain anything so long as they have an ANTLR_BASE_TREE interface. That interface contains a pointer to the higher level structure, such as COMMON_TREE, which in turn can point to an even higher level tree. But, you need to implement an adaptor, which will handle the tree for you and which the generated code will use. The adaptor needs to provide the methods in the BASE_TREE_ADAPTOR. You can probably create a COMMON adaptor, then install pointers to your own methods for those that won't work as is. To be honest though, I don't know of anyone that is doing this, so you may be pioneering here, though the standard implementation uses the same mechanisms, so it must 'work' ;-)

 

It would seem that in your case you will want both an adaptor and a tree implementation. You might find it just as easy to implement the standard tree, then use a tree grammar to construct your own tree, though you shoudl not HAVE to do this.

 

Jim

 

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Tomas Potrusil
Sent: Tuesday, April 01, 2008 3:58 AM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

I was wrong. I do not need to "override" a tree, but a tree adaptor! Investigating the mailing-list and the source code I've found that the generated parser uses just the adapter and not the tree directly. But then there is something strange in the current C runtime:

 

In Java runtime the tree adaptor interface works with "Object" objects only. Of course it must abstract access to real tree nodes - it is an adaptor; not just an object factory.  Terence Parr in a documentation says: "Rather than have a separate factory and adaptor, I've merged them."

 

The C runtime simulates its Java version, but it doesn't work with void* ("Object" in C) but directly with ANTLR3_BASE_TREE. It is not an adaptor anymore, it is just an object factory. Methods like

ANTLR3_TREE_ADAPTOR::addChild(...adaptor, pANTLR3_BASE_TREE t, pANTLR3_BASE_TREE child)

are useless, because everyone can call t->addChild(child) directly.

 

This prevents me to use our existing AST C++ classes within ANTLR without "subclassing" them from ANTLR3_COMMON_TREE, doesn't it...

 

Tom

 

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Monday, March 31, 2008 1:39 AM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

You will probably find it best to override pANTLR3_COMMON_TREE by encapsulating this within your own structure, as per the docs. This, as all the structures are, is a set of pointers to functions and you need only override the ones that you have to, just as in Java. Runtime type checking 'can' be an overhead, so i am not sure you would want to do that anyway, but I will contemplate your suggestion of course as it has some merit.

 

Jim

 

From: Tomas Potrusil [mailto:potrto at centrum.cz] 
Sent: Friday, March 28, 2008 5:43 AM
To: Jim Idle
Cc: ANTLR
Subject: RE: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

Oh yes, I know. I've already made a prototype implementation of a part of the grammar based on the idea I presented bellow (atom returns [OurNode* result] etc.). It is working but it is a little bit clumsy and I cannot use the resulting AST for a tree parsing - of course, I'm creating my own AST.

 

I've been thinking about the new tree adapter (I was talking about bellow) and probably you are true,  few C++ wrappers could do the work. But there is one inconvenience - there is not an "abstract" tree yet. The most abstract tree is ANTLR3_BASE_TREE_struct which contains children vector and other attributes. The ANTLR3_TREE_struct with only pointers to functions (something like a Java interface) would suit my needs better. Our existing AST nodes solve the storage already... Could you do it, please?

 

Another problem is safety. When somebody call ANTLR3_BASE_TREE_struct::addChild(pANTLR3_BASE_TREE tree) for example, I must trust him that the tree argument is really the tree he is calling. I cannot write dynamic_cast<MyTreeWrapper>(tree->super). This cannot be solved in the current C-based system.

 

Tom

 

From: Jim Idle

 

ANTLR 3.1 C target can now incorporate C++ code directly into the grammar and so can easily call your existing C++ code. All you do is compile the C output file as C++ (or rename it to .cpp perhaps). 

Can you try using that and let me know if you think that there is anything that you could do if the runtime was C++ that you can't do right now? I don't really think that there will be.

You need to get the latest 3.1 snapshot from the downloads page and use the ANTLR Tool hjar in there. Then build the ANTLR 3.1 C runtime from the tar.gz in the dist director under the runtime/C directory in the snapshot. 3 or 4 people have successfully integrated their C++ code with the C target now and I think you will have similar success :-)

Jim

 

-----

Hallo,

 

I'm new to the list. I'm trying to use ANTLR for generating a SQL parser because our current parser doesn't support Unicode input - it was generated by Lex/Yacc. We use C++ and we have our own complex AST that is used by a SQL engine already... So my idea is to write a tree adapter that would create our existing AST nodes (they would just inherit ANTLR tree interface).

 

And here comes a problem that ANTLR 3.x doesn't contain support for "pure" C++ implementation. I've just found Jim Idle's "promise":

 

> Later I may well produce a complete C++ implementation from scratch,

> however, at this point I am not sure that it buys you anything. Please

> let me know if there are things you cannot do with the system as it

> stands (other than access the tokens and so on using C++ objects, which

> will be done later). 

 

I know that the problem could be solved with the current system somehow, but it would be probably very ugly. So yes, complete C++ implementation will buy us something! Or we can use ANTLR 2.x...

 

Right now we will probably try to build the AST by hand:

 

atom returns [OurNode* result]

@init { $result = NULL; }

:              NUMBER

                {

                               std::string str((char*)$NUMBER.text->chars, $NUMBER.text->len);

                               $result = new OurNumberNode(str);

                };

 

Or do you have some other ideas?

 

Thanks

 

Tom

 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080402/546dee1e/attachment-0001.html 


More information about the antlr-interest mailing list