[antlr-interest] Summary of ANTLR Issues

Ric Klaren klaren at cs.utwente.nl
Tue Jul 8 04:13:48 PDT 2003


On Mon, Jul 07, 2003 at 01:22:30PM -0400, Tiller, Michael (M.M.) wrote:
> Synthetic tokens:
>
> I find myself using "synthetic" tokens quite often.

This is normal no worries ;)

> or if perhaps that the language I'm parsing just doesn't suit itself to
> utilizing the existing tokens?

Depens much on the situation I guess. As long as you can stuff something at
the begin of a subtree that's nice and recognizable by a treeparser you'll
be ok. At least I am ok, I think, while I do that :)

> It seems to me that it makes sense to introduce nodes that are related to
> *rules* (some rules, not all rules) as well as tokens.

Looking at it pragmatically with LL(k) grammars you can often wind up in a
situation that your rule(name)s aren't that meaningfull. Of course this
depends a lot on what you're parsing. Basically you want some tree kindoff
structure rolling out of your parser, antlr is very pragmatic about it and
allows you to shape that tree very much to your liking. There are tools
where you don't have this luxury and you only have the parse tree (those
suck imho :) ).

> So, you are probably saying...ANTLR supports imaginary tokens so what are
> you on about here.  ....
> ... By "direct support", I mean the ability to
> use them for automatic AST construction.

I think Loring is sitting on some nice stuff he'll release when the lawyers
are banished from this universe ;)

> Heterogeneous ASTs:
>
> For my project, I used heterogeneous ASTs.

Quite brave ;)

> Again, you are probably saying "but ANTLR doesn't show any bias for AST
> orientation, what are you complaining about?".

Hmm looking at how the code for heterogenous stuff looked in 2.7.1 I'd say
there was a pretty strong bias for homogeneous ASTs. (Java and C++) Okay it
was possible to use but you would certainly run into surprises. 2.7.2
improved a bit and it might now work better in more cases.

> I had several problems with trying to use heterogeneous ASTs in C++.

I'm not surprised. I would like to use them but I won't bother while the
current support library is in place.

> 1) There is a major bug in 2.7.2 that prevents you from cloning hetero ASTs
> in C++ (another indication that most people use homogenous ASTs).

Do tell ;) I must admit that while I made the cloning stuff I probably did
not think of all possible uses in the support lib and stuff so maybe you
stumbled on something I overlooked.

> 2) Even though I can associate heterogeneous types with tokens, ANTLR
> doesn't respect them for synthetic tokens.  By "respect" I mean that it
> doesn't generate the appropriate factory initialization code (there is a
> workaround for this by creating a dummy rule that utilizes the synthetic
> tokens as terminals) and it doesn't allow you to operate on specific
> members and methods for your heterogeneous ASTs in the production rules
> (because you have to manually create them so it has to use the factory and
> therefore uses a generic interface).  The former is what I was eluding to
> at the end of the previous section on synthetic nodes.

If you could supply me with a small compileable grammar written as you
would expect things to be handled (no workarounds) then I can have a look.

> Currently, you have to really get to know the C++ AST classes and the class
> hierarchy is (in my opinion) pretty awkward, which leads me to....

I hear you it is a mess.

> C++ AST Classes:
>
> If you look at the C++ hierarchy for AST components, you see all sorts of
> types.  Off the top of my head you have AST, BaseAST, CommonAST, ASTRef,
> RefAST, nullAST (along with several other types I cannot remember).  In
> addition, every time you want to create your own AST, you have to not only
> define your own type but also define several methods *that cannot be
> inherited* and then define a reference type (at least the examples indicate
> you should, but I think that currently this only applies if you are
> associating the type with a terminal token but if synthetic nodes were
> properly supported you'd need to do this in every case).

The C++ hierarchy basically mirrors the java one (although some of the
above classes might be interfaces in java, not sure) It is a very direct
port of the java code using a reference counter to mimic java's garbage
collection. I'm annoyed with it myself but won't rewrite (if I have time)
untill we get to the ANTLR 3 series. The codebase of ANTLR 2 is not ideal.

> But I think there is a bigger picture here to keep in mind.  Ric is right,
> much of the complexity comes from the RefAST material.  But what do you
> need reference counting for?!?

The reference counter makes the treebuilding code a lot easier. I msyelf
would lean towards a construction with a custom allocator or something
similar. It would probably also depend a lot on how Loring's new tree
building stuff would work under the hood to see what's most practical.

> So if ASTs cannot, for practical purposes, be shared then why not simply
> reimplement them with a memory management scheme that makes sense for
> non-sharing objects (e.g. the parent explicitly deletes its children when
> it gets deleted).  This kind of setup would greatly simplify the class
> structure and facilitate either templates or (even better) ! polymorphism.

Automatically deleting children could be hairy... But losing the reference
counter is one of my top priorities in a rewrite.

> Now I recognize that it may not always be possible to get things as simple
> as "class MyNode : public antlr::AST { ... };".  If it isn't possible, then
> it would be ideal to have some facility for having ANTLR using some kind of
> "mix-in" approach where I can just define:

I'm currently thinking in the lines of some templates and trait classes to
customize most of the token/AST behaviour. Nothing really worked out yet
but I have some ideas for a direction.

> ...and then ANTLR does something like what CORBA does where it uses that
> original class and its own specific stuff together to form another class to
> be used in the framework, e.g.
>
> class MyNode_Impl : public MyNode, public antlr::AST_Impl {
>    // Add clone methods, etc.
> };
>
> The idea would be to minimize the work necessary in creating the custom
> types.

I dunno I personally don't mind doing some simple groundwork, it gives you
a lot more control than a tool that tries to be intelligent. But I agree
that things could be easier. But seeing that I still don't see
heterogeneous AST's as something 100% ready for production code, I'd not
put it on a high priority to provide something like this. (but in the
future it would be a good idea :) )

> I think that with the exception of the C++ class hierarchy, much of this is
> easy to address.  Looking at my grammar, things would be greatly simplified
> if the following were possible:
>
> 1) Automatic construction of synthetic nodes via a syntax like:
>
> declaration<AST=DeclarationNode>
>   : type name ";"
>   ;

Personally I'd wait for Loring's stuff before adding stuff like this. (I
don't mind the ## = #( .. ) syntax too much) Maybe I'd go for Monty's
suggestion of a macro facility in antlr then it would be easy to make some
sugar for this.

> 2) Ability to reference heterogeneous methods and members , e.g.:
>
> declaration<AST=DeclarationNode>
>   : t:type name ";" { ##->setType(t->getTypeName()); }
>   ;

I do not follow, this should already be supported. If it's not working it's
a bug.

> 3) Definitions of custom AST types should involve a minimum of code.

Agree.

> OK, so that is my feedback.  As I said, I've tried to be constructive and
> propose solutions and not just complain about the current functionality.  I
> don't know enough about the backend side of ANTLR to be more specific.

Thank you for the feedback :)

BTW I yesterday put up a new snapshot and it has some heterogeneous AST
fixes. Not sure if they affect your problems.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
  "You can't expect to wield supreme executive power just because some
   watery tot throws a sword at you!"
  --- Monty Python and the Holy Grail


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list