[antlr-interest] Re: antlr vs. javacc+jjtree tree construction
lgcraymer
lgc at mail1.jpl.nasa.gov
Wed Mar 27 15:29:40 PST 2002
--- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
> Of course, in a real grammar, you'd have about 8 more levels of
nesting
> in the grammar and hence in the tree. All of this is noise and
still
> doesn't tell you anything about structure of the input--it only
tells
> you what rules are applied to match it (sometimes this is enough to
work
> with for simple translations, but they always are simple
syntax-directed
> translations.)
>
> The AST approach is better because it is MUCH easier to manipulate
them
> not to mention the fact that they are way smaller.
>
> Please let me know if the value of the AST over the parse tree is
not
> clear. I would like to make this into a FAQ entry with a convincing
> argument.
Parse trees with rule labels are probably a necessity for LR parsers
where context is identified from the bottom up, and have value for
attribute grammar approaches where the nonterminals (rule identifier
nodes) provide context information for generating output from the
parse tree. Basically, JavaCC went with the classic computer science
approach to tree building while Ter took an alternative route which
emphasizes tree transformation as a key step in source-to-source
language translation.
ANTLR's annotation approach for constructing trees is much nicer than
the conventional approach--nonterminals represent language features
(like "if" for conditional statement blocks) rather than the grammar
writer's view of the language. Why should
foo : A B C bar ;
bar : D E F ;
generate a different tree than
foobar : A B C D E F ;
by default?
Worse: it is possible to generate both trees in one grammar.
Recognizing that a #( foo A B C #(bar D E F) ) tree is the same as
#( foobar A B C D E F ) is just an added burden to the programmer.
With ANTLR, "C" can be identified as an important feature for decoding
"A B" and "D E F" semantics via the structuring
foobar : A B^ C^ D E F ;
which results in a tree matched by
tfoo : #( C #( B A ) D E F ) ;
that effectively distinguishes C (root), B A (B as first child of C, A
as a child of B), and D E F (other children).
Because ANTLR matches trees from the top down, the "rule identifier"
nodes are unnecessary--they can be considered implicit in the tree
walker rules. Also, ANTLR trees can be manipulated from one language
structure to another--tree transformation is a very clean approach for
language translation.
--Loring
> Best regards,
> Terence
> --
> Chief Scientist & Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list