[antlr-interest] antlr vs. javacc+jjtree tree construction
Terence Parr
parrt at jguru.com
Wed Mar 27 10:10:24 PST 2002
On Wednesday, March 27, 2002, at 03:24 AM, Robert Enyedi wrote:
> Hi, everyone
>
> I have worked previously with JavaCC and I used JJTree to build syntax
> trees. By default, it creates a root node for each grammar rule
> (identified by the constant JJT<RULENAME>) and attaches to it as child
> nodes the nodes that come from its sub rules.
>
> When I got to use buildAST in ANTLR for the first time, I noticed a
> fundamental difference: by default it creates a list of token nodes. It
> is true that if I want the same behavior as in JavaCC, I can use
> the action {#ruleName=#([NODE_TYPE_NAME],#ruleName);} and define
> NODE_TYPE_NAME in the tokens section.
>
> But why do I have to do this manually? Isn't there a workaround to
> automate this task? And, somehow, isn't JJTree's behavior the natural
> one?
Hi Robert,
There is a fundamental difference in the kind of tree generated as you
describe from jjtree vs ANTLR that wasn't clear from the other
responders. They focused on how you create trees in ANTLR, but the
biggest issue is that you are used to creating parse trees not syntax
trees (ASTs).
A parse tree is a record of the rules (and tokens) used to match some
input text whereas a syntax tree records the structure of the input and
is insensitive to the grammar that produced it. Note that there are an
infinite number of grammars for any single language and hence every
gramar will result in a different tree form because of all the
intermediate rules. An abstract syntax tree is a far superior
intermediate form precisely because of this insensitivity and because it
highlights the structure of the language not the grammar.
It's best to look at an example. For input 3+4 you really want to use
the following intermediate form:
+
| \
3 4
That is, the operator at the root and 3 and 4 as operands (children).
In ANTLR child-sibling form, you'd have
+
|
3 -- 4
Ok, so now a parse tree. I'll pick an extremely simple one out of the
infinite number:
expr
|
plus
| \ \
3 + 4
Of course, in a real grammar, you'd have about 8 more levels of nesting
in the grammar and hence in the tree. All of this is noise and still
doesn't tell you anything about structure of the input--it only tells
you what rules are applied to match it (sometimes this is enough to work
with for simple translations, but they always are simple syntax-directed
translations.)
The AST approach is better because it is MUCH easier to manipulate them
not to mention the fact that they are way smaller.
Please let me know if the value of the AST over the parse tree is not
clear. I would like to make this into a FAQ entry with a convincing
argument.
Best regards,
Terence
--
Chief Scientist & Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list