[antlr-interest] antlr vs. javacc+jjtree tree construction

Wed Mar 27 10:10:24 PST 2002

On Wednesday, March 27, 2002, at 03:24  AM, Robert Enyedi wrote:

> Hi, everyone
>  
> I have worked previously with JavaCC and I used JJTree to build syntax 
> trees. By default, it creates a root node for each grammar rule 
> (identified by the constant JJT<RULENAME>) and attaches to it as child 
> nodes the nodes that come from its sub rules.
>  
> When I got to use buildAST in ANTLR for the first time, I noticed a 
> fundamental difference: by default it creates a list of token nodes. It 
> is true that if I want the same behavior as in JavaCC, I can use 
> the action {#ruleName=#([NODE_TYPE_NAME],#ruleName);} and define 
> NODE_TYPE_NAME in the tokens section.
>  
> But why do I have to do this manually? Isn't there a workaround to 
> automate this task? And, somehow, isn't JJTree's behavior the natural 
> one?

Hi Robert,

There is a fundamental difference in the kind of tree generated as you 
describe from jjtree vs ANTLR that wasn't clear from the other 
responders.  They focused on how you create trees in ANTLR, but the 
biggest issue is that you are used to creating parse trees not syntax 
trees (ASTs).

A parse tree is a record of the rules (and tokens) used to match some 
input text whereas a syntax tree records the structure of the input and 
is insensitive to the grammar that produced it.  Note that there are an 
infinite number of grammars for any single language and hence every 
gramar will result in a different tree form because of all the 
intermediate rules.  An abstract syntax tree is a far superior 
intermediate form precisely because of this insensitivity and because it 
highlights the structure of the language not the grammar.

It's best to look at an example.  For input 3+4 you really want to use 
the following intermediate form:

+
|  \
3  4

That is, the operator at the root and 3 and 4 as operands (children).  
In ANTLR child-sibling form, you'd have

+
|
3 -- 4

Ok, so now a parse tree.  I'll pick an extremely simple one out of the 
infinite number:

expr
   |
plus
   |  \  \
  3  +  4

Of course, in a real grammar, you'd have about 8 more levels of nesting 
in the grammar and hence in the tree.  All of this is noise and still 
doesn't tell you anything about structure of the input--it only tells 
you what rules are applied to match it (sometimes this is enough to work 
with for simple translations, but they always are simple syntax-directed 
translations.)

The AST approach is better because it is MUCH easier to manipulate them 
not to mention the fact that they are way smaller.

Please let me know if the value of the AST over the parse tree is not 
clear.  I would like to make this into a FAQ entry with a convincing 
argument.

Best regards,
Terence
--
Chief Scientist & Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/