[antlr-interest] postmortem

Thu Mar 13 13:05:47 PDT 2008

Actually, getting ANTLR to generate real parse trees with unique labels for the non-terminals (rule tokens) is fairly easy to do--I did that for ANTLR 2 with about 10 liines of code.  ANTLR 3 is structured a little bit differently than ANTLR 2--it might be doable in 5.  You need to create a token type for every rule name--I used a RULE_ prefix--and then just add the appropriate token as the root of the subtree produced by a rule.

I did not do this because I thought I needed parse trees for routine tasks, but rather to experiment with the idea of "reversible" translation--is there a way to translate language B into language A with minimal effort given a translator from A to B?  Parse trees seemed to provide a useful output form for that purpose since one can just output token text given a parse tree to regenerate the source.  One of these days, I will revisit that idea.

For practical translation, though, parse trees are not particularly helpful--they capture preconceptions about the input language and contribute little to the analysis phase(s) of a translator.  Failure to design a tree structure that helps in performing analysis seems invariably to lead to "action-packed" grammars that become less and less maintainable as they are asked to do more and more.  Eventually, you get to the point where the grammar itself is virtually unmaintainable because it is hidden by a vast amount of target language code.

--Loring

----- Original Message ----
> From: Thomas Brandon <tbrandonau at gmail.com>
> To: Andy Tripp <antlr at jazillian.com>
> Cc: antlr-interest <antlr-interest at antlr.org>
> Sent: Thursday, March 13, 2008 1:20:35 AM
> Subject: Re: [antlr-interest] postmortem
> 
> On Thu, Mar 13, 2008 at 7:58 AM, Andy Tripp  wrote:
> >
> >  Jim Idle wrote:
> > > I think you miss the point. We can't 'know' that they did or didn't want a
> > > flat tree. Who is this someone that you are designating tasks like this to?
> > Well, we can't "know' anything about what anyone wants, in general. The best
> >  we can do is make a best guess. And I think the best guess is that most
> > ANTLR users want a non-flat AST.
> Yes, most ANTLR users want a non-flat tree. But most (if not all) of
> these users do not want a parse tree. The parse trees generated by
> ANTLR are not just like ASTs and cannot be used as such. Parse trees
> consist of standard AST nodes for all the actual language nodes and
> special parse tree nodes (of type ParseTree extending CommonTree) for
> the rule references. These parse tree nodes have a token type of 0.
> Thus you cannot use a tree parser against a parse tree and manually
> walking the tree would be complicated. Parse trees (as generated by
> ANTLR) are more of a debugging aid than a type of AST that
> automatically adds structure.
> 
> Perhaps you could add an output=CST (Concrete Syntax Tree, aka parse
> tree) option. But how would this work?
> Given a rule like:
> myop: modifier MYOP contents;
> what should our proposed CST constructor generate? What token types
> should rule references automatically generate? You could generate a
> token type named after the rule but producing ^(MYOP attributes MYOP
> contents) where the first MYOP is our auto-generated one mapping to no
> part of the input and the second is an actual token seems bizarre and
> likely to cause troubles. Maybe we could do ^(RULE_MYOP modifier MYOP
> contents) but do any of the users who don't want flat ASTs want that?
> And do they also want:
> modifier: PUBLIC | PROTECTED;
> to generate ^(RULE_MODIFIER PUBLIC)?
> I'd imagine what they really want is not a CST but the AST ^(MYOP
> modifier content). With modifier having no dummy parent and contents
> having one.
> OK, so we don't want output=CST we want to auto-generate ASTs. But
> how? Given the above case we might think we could have a rule that if
> there's one token reference and other rule references we make the
> token the root. That's easy but what if we've got:
> method: keywords ID args catch;
> we probably don't want ^(ID keywords args catch) as that's very hard
> for our tree walker to distinguish from:
> field: keywords ID init;
> which makes ^(ID keywords init).
> And what do we do with:
> method: keywords ID LPAREN args RPAREN CATCH catch;
> Here I'd probably want the AST ^(METHOD_CALL ID keywords args catch)
> but how can a tool know that.
> OK, so we want to have some default rules and some syntax to disable
> automatic generation. But how often is this auto generation actually
> going to be used? I think you're very often going to want to disable
> any such automatic generation. OK, so any use of AST rebuild operators
> disables the automatic generation. But what about your "modifiers:
> PUBLIC | PROTECTED;" rule? Adding "options { autoAST=false; }" to all
> such rules is going to be pretty annoying and "modifiers!: PUBLIC |
> PROTECTED;" isn't going to be very understandable for new users.
> 
> I think if you spend a bit of time actually thinking about how you'd
> manage to implement what you want you'll see it really doesn't work.
> 
> Tom.
> 

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080313/8e3405e0/attachment.html