[antlr-interest] ANTLR v3 tree construction :)

Terence Parr parrt at cs.usfca.edu
Thu Jun 23 16:39:05 PDT 2005


On Jun 23, 2005, at 3:43 PM, Akhilesh Mritunjai wrote:

> Hi Terence

Hi.  I've started a new blog today and decided to add my reply to it  
(dup of this email at the bottom of today's blog entry):

http://www.antlr.org/blog/antlr3/treegrammar.tml

> Yup, in tree parser, and I vote for default in-line.
> As for creating a new tree, thats what 'buildAST=true'
> is there for. Am I right ? Coupled together with a

Ok, so output=AST (in 3.0 parlance) would construct a tree from  
whatever the input was: tokens or trees.  A tree parser with rewrite  
rules, however, would do a rewrite.  Let's see an example:

parser grammar langParser;
options {output=AST;}
program : decl+ -> ^(DECL decl+) ;
decl : FLOAT_TYPE ID ';' -> ^(FLOAT_TYPE ID) ;

which creates trees like this:

        DECL
       /    \
"float"  "float"
    |        |
   var1     var2

(ain't that easy using the new notation?!!)

then we'd have a tree grammar that created a new tree from the old:

tree grammar langTreeParser;
options {output=AST;}
program : ^(DECL decl+) ;
decl : ^(FLOAT_TYPE ID) -> ^(FLOAT_TYPE ID FLOAT["0.0"]) ; // add "=  
0.0" initialization

(note how the tree grammar is the collection of right-hand-sides from  
the -> rewrites).  heh, somebody ought to write this down as doc ;)

Now, to do rewrites, one would leave off the output option:

tree grammar langTreeParser;
program : ^(DECL decl+) ;
decl : ^(FLOAT_TYPE ID) -> ^(FLOAT_TYPE ID FLOAT["0.0"]) ; // add "=  
0.0" initialization

Yes...I guess that makes sense.  The method generated for rule  
"program" would have to replace the children list of DECL with the  
*new* subtrees created by rule decl.  I guess like this:

program() {
   match(DECL);
   match(DOWN);
   while ( input.LA(1)==FLOAT_TYPE ) {
     result = decl();
     // replace child i with result tree
   }
   match(UP);
}

I am not sure how to do that easily, though I of course see the need  
(particularly given the tree sizes you refer to below...yikes!).  The  
TreeNodeStream that serializes the trees for two-dimensional parsing  
could be asked to replace the node I suppose...sort of like asking an  
iterator to do a delete.  Hmm...that is an interesting approach.  It  
would only allow replacement of a node under the "cursor" at input.LT 
(1) I guess.  It would have to make sure not to try to parse the new  
node.  You might have a  single node replaced with two nodes:

tree grammar t;
a : b+ ;
b : B -> X Y ; // replace B with X Y (2 nodes)

If the input were say B B then after the first call to b the "X Y"  
would replace the first B, yielding X Y B, but the TreeNodeStream  
would have to do a "replace and jump over new nodes" so the current  
pointer was on the 2nd B not the Y.

Regardless, the approach would invisibly allow a completely new tree  
to be created or to have a tree modified inline.  Wow.

> 'dupTree=true' option (patch/standard), it will parser
> and either build a new tree of just the traversed
> nodes or 'peel-off' the whole traversed tree, if user
> doesn't want to write whole grammar for un-interesting
> sub-trees.

I'm not sure I follow.  I normally use dot (wildcard) to jump over  
uninteresting nodes/trees, though I'm not sure how well that will  
work in 3.0 as the parsing algorithm is a little different.  I might  
have to add another element other than dot to mean ignore tree  
instead of just ignore node.

> The reason for asking this is to remove an assumption
> that trees would be small enough to fit in memory! (My
> trees can have around 10M nodes... and I have to scale
> to ~half a billion nodes very shortly... and no I'm
> not making this up, our tool will process chip
> descriptions that will routinely have 10-100M gates)

Seems like you'll have a hard time fitting 500M nodes in memory.  1  
32 bit word per node would already be 4G of RAM. ;)  Are you going to  
marshall these into/out of memory like virtual memory?

> Basically, what I'm looking for is a full blown ANTLR
> tree-parsing stuff that whould make writing native
> code (visitors etc) as un-necessary as can be... Tree
> parser should be a full blown declarative language to
> process and manipulate trees in a language agnostic
> way.

That is my goal.  How much RAM does each node take for you?

>
> Next thing on wish list would be a XQuery/XPath like
> declarative query language to query and manipulate
> tree.

eeewwww! ;)  Down with anything specified in XML that a human has to  
touch. ;)

> Thanks for all this stuff Terence

My pleasure :)

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com



More information about the antlr-interest mailing list