[antlr-interest] advocacy of C++ support in ANTLR 3.x

Tomas Potrusil potrto at centrum.cz
Wed Apr 2 01:10:34 PDT 2008


What was wrong on my ideas bellow? I know that with the current design I
need a new tree adaptor and to "override" the ANTRL_BASE_TREE which means
that I must include the ANTLR_BASE_TREE structure (or better, MY_TREE
structure which will contain COMMON_TREE which contains ANTLR_BASE_TREE
structure) inside my AST classes. This is completely clear and it will
probably work without any problems.

 

All I wanted to point out was that the current design is not very correct
(unlike Java version).

 

One question: why should I (or probably you - you are the parser generator)
use such an adaptor when you can call some functions (like addChild)
directly through ANTLR_BASE_TREE interface? You hold pointer to that
interface. But you give up to do this and call the "adaptor" (function with
the same name - addChild) which in turn calls the function directly on the
tree. There are few such functions, right? Why they are there? It doesn't
make sense.

 

Java version (its documentation) makes this clear. These functions are in
the TreeAdaptor interface also, but they have slightly different semantic
meaning: their attributes are Objects not direct interfaces. Of course, the
adaptor should hide the real tree that is used behind the scene! Users can
use Tree interface (this does the BaseTreeAdaptor), but can use completely
different tree classes they like.

 

Here is a correspondence in this list:

> Why does the method create(Token) return an Object? I'm curious why an

> Object and not a Tree. When you manipulate trees, it seems to cause

> quite a bit of (useless?) casts everywhere...

this is because you might want tree nodes that do not implement ANTLR's 

tree interface. When I made a TreeAdaptor to output XML (DOM) I was 

quite happy that I could work with Objects :)

 

There is one another feature that is hidden in the adaptors. They are object
factories. They create new trees so that the runtime doesn't need to know
the real type of the tree. In Java this is particularly funny because these
"factory methods" (create()) are adaptors as well and so they return Object.
Nevertheless Terence Parr knows this: "Rather than have a separate factory
and adaptor, I've merged them."

 

In C version these "factory methods" returns pANTLR3_BASE_TREE. This is
correct, no problem.

 

To sum it up - the ANTLR3_BASE_TREE_ADAPTOR is an object factory, but it is
NOT a tree ADAPTOR. It's a pity!

 

Why I'm writing all this garbage? Because the idea of tree adaptors is
wonderful and I cannot use it. Adaptors would make my implementation much
easier. I would just create a new adaptor that would work with our existing
C++ AST classes and that's it! Right now I must include a "MY_TREE"
structure in our classes that just recalls ANTLR3_BASE_TREE functions to our
methods. This is an overhead that is not necessary.

 

Tom

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Tuesday, April 01, 2008 6:54 PM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

Please read the comments in the source for common tree adaptor and base tree
adaptor before attempting this, as well as
http://www.antlr.org/api/C/index.html. 

 

In the C version , all adaptors and so on should return a pointer to
pANTLR_BASE_TREE, which should be contained within your own tree nodes
(which can contain anything so long as they have an ANTLR_BASE_TREE
interface. That interface contains a pointer to the higher level structure,
such as COMMON_TREE, which in turn can point to an even higher level tree.
But, you need to implement an adaptor, which will handle the tree for you
and which the generated code will use. The adaptor needs to provide the
methods in the BASE_TREE_ADAPTOR. You can probably create a COMMON adaptor,
then install pointers to your own methods for those that won't work as is.
To be honest though, I don't know of anyone that is doing this, so you may
be pioneering here, though the standard implementation uses the same
mechanisms, so it must 'work' ;-)

 

It would seem that in your case you will want both an adaptor and a tree
implementation. You might find it just as easy to implement the standard
tree, then use a tree grammar to construct your own tree, though you shoudl
not HAVE to do this.

 

Jim

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Tomas Potrusil
Sent: Tuesday, April 01, 2008 3:58 AM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

I was wrong. I do not need to "override" a tree, but a tree adaptor!
Investigating the mailing-list and the source code I've found that the
generated parser uses just the adapter and not the tree directly. But then
there is something strange in the current C runtime:

 

In Java runtime the tree adaptor interface works with "Object" objects only.
Of course it must abstract access to real tree nodes - it is an adaptor; not
just an object factory.  Terence Parr in a documentation says: "Rather than
have a separate factory and adaptor, I've merged them."

 

The C runtime simulates its Java version, but it doesn't work with void*
("Object" in C) but directly with ANTLR3_BASE_TREE. It is not an adaptor
anymore, it is just an object factory. Methods like

ANTLR3_TREE_ADAPTOR::addChild(...adaptor, pANTLR3_BASE_TREE t,
pANTLR3_BASE_TREE child)

are useless, because everyone can call t->addChild(child) directly.

 

This prevents me to use our existing AST C++ classes within ANTLR without
"subclassing" them from ANTLR3_COMMON_TREE, doesn't it.

 

Tom

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Monday, March 31, 2008 1:39 AM
To: ANTLR
Subject: Re: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

You will probably find it best to override pANTLR3_COMMON_TREE by
encapsulating this within your own structure, as per the docs. This, as all
the structures are, is a set of pointers to functions and you need only
override the ones that you have to, just as in Java. Runtime type checking
'can' be an overhead, so i am not sure you would want to do that anyway, but
I will contemplate your suggestion of course as it has some merit.

 

Jim

 

From: Tomas Potrusil [mailto:potrto at centrum.cz] 
Sent: Friday, March 28, 2008 5:43 AM
To: Jim Idle
Cc: ANTLR
Subject: RE: [antlr-interest] advocacy of C++ support in ANTLR 3.x

 

Oh yes, I know. I've already made a prototype implementation of a part of
the grammar based on the idea I presented bellow (atom returns [OurNode*
result] etc.). It is working but it is a little bit clumsy and I cannot use
the resulting AST for a tree parsing - of course, I'm creating my own AST.

 

I've been thinking about the new tree adapter (I was talking about bellow)
and probably you are true,  few C++ wrappers could do the work. But there is
one inconvenience - there is not an "abstract" tree yet. The most abstract
tree is ANTLR3_BASE_TREE_struct which contains children vector and other
attributes. The ANTLR3_TREE_struct with only pointers to functions
(something like a Java interface) would suit my needs better. Our existing
AST nodes solve the storage already. Could you do it, please?

 

Another problem is safety. When somebody call
ANTLR3_BASE_TREE_struct::addChild(pANTLR3_BASE_TREE tree) for example, I
must trust him that the tree argument is really the tree he is calling. I
cannot write dynamic_cast<MyTreeWrapper>(tree->super). This cannot be solved
in the current C-based system.

 

Tom

 

From: Jim Idle

 

ANTLR 3.1 C target can now incorporate C++ code directly into the grammar
and so can easily call your existing C++ code. All you do is compile the C
output file as C++ (or rename it to .cpp perhaps). 

Can you try using that and let me know if you think that there is anything
that you could do if the runtime was C++ that you can't do right now? I
don't really think that there will be.

You need to get the latest 3.1 snapshot from the downloads page and use the
ANTLR Tool hjar in there. Then build the ANTLR 3.1 C runtime from the tar.gz
in the dist director under the runtime/C directory in the snapshot. 3 or 4
people have successfully integrated their C++ code with the C target now and
I think you will have similar success :-)

Jim

 

-----

Hallo,

 

I'm new to the list. I'm trying to use ANTLR for generating a SQL parser
because our current parser doesn't support Unicode input - it was generated
by Lex/Yacc. We use C++ and we have our own complex AST that is used by a
SQL engine already... So my idea is to write a tree adapter that would
create our existing AST nodes (they would just inherit ANTLR tree
interface).

 

And here comes a problem that ANTLR 3.x doesn't contain support for "pure"
C++ implementation. I've just found Jim Idle's "promise":

 

> Later I may well produce a complete C++ implementation from scratch,

> however, at this point I am not sure that it buys you anything. Please

> let me know if there are things you cannot do with the system as it

> stands (other than access the tokens and so on using C++ objects, which

> will be done later). 

 

I know that the problem could be solved with the current system somehow, but
it would be probably very ugly. So yes, complete C++ implementation will buy
us something! Or we can use ANTLR 2.x.

 

Right now we will probably try to build the AST by hand:

 

atom returns [OurNode* result]

@init { $result = NULL; }

:              NUMBER

                {

                               std::string str((char*)$NUMBER.text->chars,
$NUMBER.text->len);

                               $result = new OurNumberNode(str);

                };

 

Or do you have some other ideas?

 

Thanks

 

Tom

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080402/5b035861/attachment-0001.html 


More information about the antlr-interest mailing list