[antlr-interest] postmortem

Kenneth Domino kenneth.domino at domemtech.com
Thu Mar 13 10:20:14 PDT 2008


> No. Instead, how about saying "if there is exactly one terminal (i.e. 
> lexer token or literal) in the rule,
> put a ^ after that:
>
> (attributes MYOP^ contents)

Strictly speaking, this syntax creates an AST, not a parse tree.
A parse tree has internal nodes as corresponding
to the non-terminals in the grammar. Children of an internal
node are the RHS symbols of the production used in the parse
(see section 2.2 of Aho, Sethi, Ullman).

Really, sometimes generating a parse tree is useful.  In the past, for
a compiler construction course, I made students modify a
Yacc-generated parser to construct a parse tree for a subset
programming language (e.g., Pascal, Turing, ...), then generate code
from that parse tree. Yes, the parse tree. In a structured editor I
once wrote, I stored the parse tree as an internal data structure,
then performed a tree walk to recreate the text with reformatting.  If
you want to write an incremental parser, you probably have to have a
parse tree to know where to pick up the parse.  I remember seeing this
in some incremental parsing papers.

FYI, if you really want a parse tree from ANTLR, then try this hack,
and I really mean hack!  Write a script, in perl, or maybe even an
ANTLR grammar translator that inserts after the LHS
symbol of a parser nonterminal rule (i.e., in the parlance of ANTLR
a RULE_REF but not a TOKEN_REF), but before a ":" the
following code:

@init {
 CommonTree realroot = (CommonTree)adaptor.create(-999, "nonterminal");
}
@after {
 retval.tree = (CommonTree)adaptor.becomeRoot(realroot, retval.tree);
}

You'd want to substitute the name of the nonterminal at
the appropriate place in the above code.  You'd also want to
add an options block: "options { output=AST; ASTLabelType=CommonTree;}".
It seems to work fine using ANTLRWorks1.1.7, with the AST
almost identical to the parse tree except for one extra node at the root.

Maybe there is a better way to do this. And, probably it is fragile.
But in lieu of an "output=CST" option, or something else that I don't know
about (because I'm not that familiar with ANTLR), this seems to work.

Ken Domino



More information about the antlr-interest mailing list