[antlr-interest] Re: antlr vs. javacc+jjtree tree construction

Wed Mar 27 15:29:40 PST 2002

--- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
> Of course, in a real grammar, you'd have about 8 more levels of 
nesting 
> in the grammar and hence in the tree.  All of this is noise and 
still 
> doesn't tell you anything about structure of the input--it only 
tells 
> you what rules are applied to match it (sometimes this is enough to 
work 
> with for simple translations, but they always are simple 
syntax-directed 
> translations.)
> 
> The AST approach is better because it is MUCH easier to manipulate 
them 
> not to mention the fact that they are way smaller.
> 
> Please let me know if the value of the AST over the parse tree is 
not 
> clear.  I would like to make this into a FAQ entry with a convincing 
> argument.

Parse trees with rule labels are probably a necessity for LR parsers 
where context is identified from the bottom up, and have value for 
attribute grammar approaches where the nonterminals (rule identifier 
nodes) provide context information for generating output from the 
parse tree.  Basically, JavaCC went with the classic computer science 
approach to tree building while Ter took an alternative route which 
emphasizes tree transformation as a key step in source-to-source 
language translation.

ANTLR's annotation approach for constructing trees is much nicer than 
the conventional approach--nonterminals represent language features 
(like "if" for conditional statement blocks) rather than the grammar 
writer's view of the language.  Why should

foo :  A B C bar ;
bar : D E F ;

generate a different tree than

foobar : A B C D E F ;

by default?

Worse:  it is possible to generate both trees in one grammar.  
Recognizing that a #( foo A B C #(bar D E F) ) tree is the same as
#( foobar A B C D E F ) is just an added burden to the programmer.

With ANTLR, "C" can be identified as an important feature for decoding 
"A B" and "D E F" semantics via the structuring

foobar :  A B^ C^ D E F ;

which results in a tree matched by

tfoo : #( C #( B A ) D E F ) ;

that effectively distinguishes C (root), B A (B as first child of C, A 
as a child of B), and D E F (other children).

Because ANTLR matches trees from the top down, the "rule identifier" 
nodes are unnecessary--they can be considered implicit in the tree 
walker rules.  Also, ANTLR trees can be manipulated from one language 
structure to another--tree transformation is a very clean approach for 
language translation.

--Loring

> Best regards,
> Terence
> --
> Chief Scientist & Co-founder, http://www.jguru.com
> Creator, ANTLR Parser Generator: http://www.antlr.org

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/