[antlr-interest] postmortem

Thomas Brandon tbrandonau at gmail.com
Thu Mar 13 01:20:35 PDT 2008


On Thu, Mar 13, 2008 at 7:58 AM, Andy Tripp <antlr at jazillian.com> wrote:
>
>  Jim Idle wrote:
> > I think you miss the point. We can't 'know' that they did or didn't want a
> > flat tree. Who is this someone that you are designating tasks like this to?
> Well, we can't "know' anything about what anyone wants, in general. The best
>  we can do is make a best guess. And I think the best guess is that most
> ANTLR users want a non-flat AST.
Yes, most ANTLR users want a non-flat tree. But most (if not all) of
these users do not want a parse tree. The parse trees generated by
ANTLR are not just like ASTs and cannot be used as such. Parse trees
consist of standard AST nodes for all the actual language nodes and
special parse tree nodes (of type ParseTree extending CommonTree) for
the rule references. These parse tree nodes have a token type of 0.
Thus you cannot use a tree parser against a parse tree and manually
walking the tree would be complicated. Parse trees (as generated by
ANTLR) are more of a debugging aid than a type of AST that
automatically adds structure.

Perhaps you could add an output=CST (Concrete Syntax Tree, aka parse
tree) option. But how would this work?
Given a rule like:
myop: modifier MYOP contents;
what should our proposed CST constructor generate? What token types
should rule references automatically generate? You could generate a
token type named after the rule but producing ^(MYOP attributes MYOP
contents) where the first MYOP is our auto-generated one mapping to no
part of the input and the second is an actual token seems bizarre and
likely to cause troubles. Maybe we could do ^(RULE_MYOP modifier MYOP
contents) but do any of the users who don't want flat ASTs want that?
And do they also want:
modifier: PUBLIC | PROTECTED;
to generate ^(RULE_MODIFIER PUBLIC)?
I'd imagine what they really want is not a CST but the AST ^(MYOP
modifier content). With modifier having no dummy parent and contents
having one.
OK, so we don't want output=CST we want to auto-generate ASTs. But
how? Given the above case we might think we could have a rule that if
there's one token reference and other rule references we make the
token the root. That's easy but what if we've got:
method: keywords ID args catch;
we probably don't want ^(ID keywords args catch) as that's very hard
for our tree walker to distinguish from:
field: keywords ID init;
which makes ^(ID keywords init).
And what do we do with:
method: keywords ID LPAREN args RPAREN CATCH catch;
Here I'd probably want the AST ^(METHOD_CALL ID keywords args catch)
but how can a tool know that.
OK, so we want to have some default rules and some syntax to disable
automatic generation. But how often is this auto generation actually
going to be used? I think you're very often going to want to disable
any such automatic generation. OK, so any use of AST rebuild operators
disables the automatic generation. But what about your "modifiers:
PUBLIC | PROTECTED;" rule? Adding "options { autoAST=false; }" to all
such rules is going to be pretty annoying and "modifiers!: PUBLIC |
PROTECTED;" isn't going to be very understandable for new users.

I think if you spend a bit of time actually thinking about how you'd
manage to implement what you want you'll see it really doesn't work.

Tom.


More information about the antlr-interest mailing list