[antlr-interest] Re: Hetero ASTs, Manual Tree Construction, C++, etc...
lgcraymer
lgc at mail1.jpl.nasa.gov
Tue Jun 17 12:39:51 PDT 2003
If you have 31 AST classes and are expecting more, it would probably
be worth your while to write a simple specification language processor
to generate AST classes with get/set methods for the fields. That
might reduce some of your agony.
--Loring
--- In antlr-interest at yahoogroups.com, "Tiller, Michael (M.M.)"
<mtiller at f...> wrote:
> I'm currently working on a project and I wanted to bounce a few
ideas off the people in the list to see what people think of how I'm
approaching this problem.
>
> First, I should state one particular bias...I don't really like
homogenous ASTs. Perhaps I just don't understand enough about how
people use homogenous ASTs or something, but I don't really feel
comfortable with them. I agree that the automatic AST building stuff
in ANTLR is very nice and I really appreciate that some care has been
taken to try and provide the best of both worlds. My main problem
with homogenous ASTs is that while I can very easily build a parser
and even a tree walker built around homogenous ASTs, I don't really
feel comfortable *doing something* with them because they don't seem
structured enough. Here is an example, consider the following
statement in my grammar:
>
> element
> : import_clause
> | extends_clause
> | ("final")?
> ( "inner" | "outer")?
> ( ( class_definition | component_clause )
> | "replaceable" ( class_definition | component_clause )
> (constraining_clause (string_comment)? (annotation)?)?
> )
> ;
>
> Now things like "final" or "inner" or "replaceable" are simple
qualifiers. I just want to note whether they are present or not.
Should I really just leave the lexical token in my AST to indicate
this? I'm assuming that is what you would do if you strictly used
automatic AST construction (perhaps I'm wrong). I don't like this
because they I have to search through the AST (agreed, I can write
macros/functions for much of this) to look for the presence of these
items. I just don't like this because it requires maintaining complex
tree structures, worrying about garbage collection, calling functions
to determine simple properties.
>
> To deal with these situations, I have fields in my heterogeneous AST
like "bool p_final". Now I have exactly what I need at my fingertips
(i.e. a boolean that indicates whether this declaration is final or
not). I can change this boolean very easily (without having to deal
at all with post-parsing AST construction). Now you LISP folks out
there problem think that the "p_" means predicate, but it
doesn't...more on this in a second.
>
> As I build my AST, I sprinkle in a few statements here and there
where I capture the presence of these qualifiers (once and for all) in
members associated with each AST node. ANTLR makes this relatively
easy (could be better though) so I'm pretty happy with that. At
first, this may seem to limit some of the possibilities of my tree
parser because these bits have been taken out of the AST structure
formally (i.e. they don't show up with getFirstChild or
getNextSibling). This is true, but so far I have yet to see a big
impact.
>
> Now, let me talk more generally about this issue. Lets get away
from the parsing and think more about tree processing. My approach
has been (or will be in some cases) to build each heterogeneous AST
out of two fundamental types of information.
>
> The first type (prefixed with a "p_") is information that comes from
the parsing stage. Essentially, these are bits and pieces that are
syntactically present in the source but (for the reasons I outlined
above) I choose not to represent with my AST. My criteria for
deciding this is pretty simple. If something is essentially a
"terminal", I pluck it out since the structure isn't "variable" (i.e.
I don't need to represent its internal structure). I use an explicit
"p_" prefix to indicate that these are syntactic elements and to
remind me that I need to spit them back out in the case of
text-to-text processing.
>
> The second type (prefix with an "m_") is information that I have
gleaned from "passes" over the AST. This might include pointers to
other nodes representing relationships (parent nodes, instance<->type
relationships, etc). This information is semantic and so I don't have
to spit it back out. Furthermore, it usually requires more analysis
to figure out how to fill in these pieces of information.
>
> How does this relate to ANTLR and tree construction? Well, I've
been implementing this scheme with ANTLR and I've found some of the
features useful, but I think there is room for improvement. By my
count, I now have 31 different AST node types. Even after I wrote
several macros and a base class for all my ASTs, I can still tell you
that defining each one was a pain. Here are my major issues with
hetero AST construction:
>
> 1) Defining the AST types is a bit of a pain. You cannot really
leverage either templates or inheritance in this process (at least not
in an easy way I have found). The systems for "cloning" and "garbage
collection", etc. just didn't seem to provide an elegant way of
facilitating reuse. It is entirely possible that people have found
elegant ways around this, but it isn't obvious, documented or directly
supported by ANTLR as far as I can tell.
>
> 2) Even when I define custom AST nodes, there is a problem (at least
in my grammar) that ANTLR doesn't create the appropriate
initializeASTFactory method. I mentioned this in a previous message
so I'm not really going to go into detail here.
>
> 3) Even when I convince ANTLR to create the appropriate
initializeASTFactory (via a hack), it still doesn't treat my nodes as
what they are. Perhaps this is a consequence of #2, but my ASTs are
still just "RefAST" instances as far as ANTLR is concerned so I've had
to write a bunch of casting functions (e.g. "asDeclaration(##).p_final
= true;") to allow me to access the node specific members/methods.
>
> In looking over the work that I've done, it strikes me that perhaps
the best way for ANTLR to deal with heterogeneous ASTs is to use a
"mix-in" sort of idiom. In other words, let me define my node type,
e.g.
>
> struct DeclarationNode {
> bool p_final;
> DeclarationNode() : p_final(false); {}
> };
>
> ...then when I specify that I want one of my tokens to include this
data (e.g. DECLARATION<AST=DeclarationNode>;) have ANTLR use
multiple-inheritance of templates to include this information, e.g.
have ANTLR create a custom data structure that blends the AST with my
type:
>
> class DECLARATION_AST : public commonAST, public DeclarationNode {
> ...
> };
>
> Then ANTLR could take responsibility for create "RefDECLARATION_AST"
and writing down the obvious methods (e.g. clone)?
>
> Now my intention here is not to complain about the lack of support
for heterogeneous ASTs in ANTLR. Overall, I'm happy with ANTLR. I'm
just trying to provide some feedback about what I think is cumbersome
and also to solicit feedback from people on whether they think, based
on looking at what my issues are, I'm going about some things the
wrong way.
>
> --
> Mike
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list