[antlr-interest] ANTLR v3 tree construction :)
Terence Parr
parrt at cs.usfca.edu
Thu Jun 23 16:39:05 PDT 2005
On Jun 23, 2005, at 3:43 PM, Akhilesh Mritunjai wrote:
> Hi Terence
Hi. I've started a new blog today and decided to add my reply to it
(dup of this email at the bottom of today's blog entry):
http://www.antlr.org/blog/antlr3/treegrammar.tml
> Yup, in tree parser, and I vote for default in-line.
> As for creating a new tree, thats what 'buildAST=true'
> is there for. Am I right ? Coupled together with a
Ok, so output=AST (in 3.0 parlance) would construct a tree from
whatever the input was: tokens or trees. A tree parser with rewrite
rules, however, would do a rewrite. Let's see an example:
parser grammar langParser;
options {output=AST;}
program : decl+ -> ^(DECL decl+) ;
decl : FLOAT_TYPE ID ';' -> ^(FLOAT_TYPE ID) ;
which creates trees like this:
DECL
/ \
"float" "float"
| |
var1 var2
(ain't that easy using the new notation?!!)
then we'd have a tree grammar that created a new tree from the old:
tree grammar langTreeParser;
options {output=AST;}
program : ^(DECL decl+) ;
decl : ^(FLOAT_TYPE ID) -> ^(FLOAT_TYPE ID FLOAT["0.0"]) ; // add "=
0.0" initialization
(note how the tree grammar is the collection of right-hand-sides from
the -> rewrites). heh, somebody ought to write this down as doc ;)
Now, to do rewrites, one would leave off the output option:
tree grammar langTreeParser;
program : ^(DECL decl+) ;
decl : ^(FLOAT_TYPE ID) -> ^(FLOAT_TYPE ID FLOAT["0.0"]) ; // add "=
0.0" initialization
Yes...I guess that makes sense. The method generated for rule
"program" would have to replace the children list of DECL with the
*new* subtrees created by rule decl. I guess like this:
program() {
match(DECL);
match(DOWN);
while ( input.LA(1)==FLOAT_TYPE ) {
result = decl();
// replace child i with result tree
}
match(UP);
}
I am not sure how to do that easily, though I of course see the need
(particularly given the tree sizes you refer to below...yikes!). The
TreeNodeStream that serializes the trees for two-dimensional parsing
could be asked to replace the node I suppose...sort of like asking an
iterator to do a delete. Hmm...that is an interesting approach. It
would only allow replacement of a node under the "cursor" at input.LT
(1) I guess. It would have to make sure not to try to parse the new
node. You might have a single node replaced with two nodes:
tree grammar t;
a : b+ ;
b : B -> X Y ; // replace B with X Y (2 nodes)
If the input were say B B then after the first call to b the "X Y"
would replace the first B, yielding X Y B, but the TreeNodeStream
would have to do a "replace and jump over new nodes" so the current
pointer was on the 2nd B not the Y.
Regardless, the approach would invisibly allow a completely new tree
to be created or to have a tree modified inline. Wow.
> 'dupTree=true' option (patch/standard), it will parser
> and either build a new tree of just the traversed
> nodes or 'peel-off' the whole traversed tree, if user
> doesn't want to write whole grammar for un-interesting
> sub-trees.
I'm not sure I follow. I normally use dot (wildcard) to jump over
uninteresting nodes/trees, though I'm not sure how well that will
work in 3.0 as the parsing algorithm is a little different. I might
have to add another element other than dot to mean ignore tree
instead of just ignore node.
> The reason for asking this is to remove an assumption
> that trees would be small enough to fit in memory! (My
> trees can have around 10M nodes... and I have to scale
> to ~half a billion nodes very shortly... and no I'm
> not making this up, our tool will process chip
> descriptions that will routinely have 10-100M gates)
Seems like you'll have a hard time fitting 500M nodes in memory. 1
32 bit word per node would already be 4G of RAM. ;) Are you going to
marshall these into/out of memory like virtual memory?
> Basically, what I'm looking for is a full blown ANTLR
> tree-parsing stuff that whould make writing native
> code (visitors etc) as un-necessary as can be... Tree
> parser should be a full blown declarative language to
> process and manipulate trees in a language agnostic
> way.
That is my goal. How much RAM does each node take for you?
>
> Next thing on wish list would be a XQuery/XPath like
> declarative query language to query and manipulate
> tree.
eeewwww! ;) Down with anything specified in XML that a human has to
touch. ;)
> Thanks for all this stuff Terence
My pleasure :)
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list