[antlr-interest] postmortem

Thu Mar 13 11:40:45 PDT 2008

On Fri, Mar 14, 2008 at 2:00 AM, Andy Tripp <antlr at jazillian.com> wrote:
>
> > Thomas Brandon wrote:
>  On Thu, Mar 13, 2008 at 7:58 AM, Andy Tripp <antlr at jazillian.com> wrote:
> >  Yes, most ANTLR users want a non-flat tree. But most (if not all) of
> > these users do not want a parse tree.
>  Agreed. But keep in mind that for many applications, simply walking the AST
> by hand is enough.
I'm familiar with your views so I tried to though it's a rather alien
approach to me so forgive any oversights..
>
> >  The parse trees generated by
> > ANTLR are not just like ASTs and cannot be used as such. Parse trees
> > consist of standard AST nodes for all the actual language nodes and
> > special parse tree nodes (of type ParseTree extending CommonTree) for
> > the rule references. These parse tree nodes have a token type of 0.
> > Thus you cannot use a tree parser against a parse tree and manually
> > walking the tree would be complicated. Parse trees (as generated by
> > ANTLR) are more of a debugging aid than a type of AST that
> > automatically adds structure.
>
>  ...I think that in many cases processing the parse tree would be very
> simple:
>
>  if (!(ast instanceof ParseTree)) {
>     doSomethingWith(ast);
>  } else {
>     // ignore ParseTree nodes
>  }
>
Well, that's not going to process the children of parse trees which is
presumably not what you want. And if you're skipping them then what
use are they exactly?
But my point was that with no token type you can switch on token
types. So you'd have to switch on the string name, which gets rather
nasty to maintain, and means changing a rule name will break your tree
walker but give no compile time errors. Not especially easy or
desirable.

<SNIP>A fair bit of going round in circles with proposed heuristics</SNIP>

>  ...but I guess maybe that puts us where we started - generating a full
> parse tree.
>  But even without solving this problem - go ahead and leave the tree as
> being hard to walk -
>  we're still ahead. The goal here is not to generate a "good" AST, but
> rather just produce something
>  that's better than nothing (i.e. a flat tree). So the newbie gets this
> tree, sees that it's hard to
>  distinguish the various cases where he's got an ID node, and starts reading
> up on how to build
>  an AST. He's better off here than the alternative: a blank stare at a flat
> ast.
<SNIP>more circles</SNIP>
>  Thanks for writing this up. Let me know if you think the replies I gave
> would work.
>  As I said, if they work, but only make things somewhat better, but not a
> completely usable AST, that's
>  still a win IMO. If they don't work, then how about simply say "if there
> are no ^'s, I'll just use
>  the existing code to get a parse tree and return that"?
>
>  Andy
(Can you please stop HTML posting or at least use a client with decent
quoting in it's HTML. Gmail can't handle your quoting and I've found
it's generally pretty good at that.)
I could go through your various proposed rules and point out the case
where they produce a structure I can't see anyone wanting but I gather
you admit that point.

The detect ^ and switch off automatic rules doesn't work because the
user may want a flat AST.

And I don't accept your view that producing a bad AST is better than
producing a flat AST. Is anyone going to use a bad AST? I can't see
why they would and don't think they should. Is it easy to allow such
auto-generation to be mixed with explicit construction to generate a
good AST? I don't think so. At best you end up with a grammar that's
messier and harder to understand than if you'd just used manual
construction. So you've spent all the time developing a system that no
one is ever going to use for more than the first grammar run that's
only purpose is to avoid giving someone a flat AST. And if they don't
look at enough docs\examples first and, heaven forbid, get a flat AST
they either go to the docs and quickly recognise their flawed
expectations of what output=AST will do or send a message to the list
that requires all of two or three sentences to answer. How is this in
anyway a good use of time or a justifiable addition of code
complexity?

Tom.