[antlr-interest] ANTLR v3 tree construction :)

Thu Jun 23 20:11:29 PDT 2005

Hi Terence

Replies inline-

--- Terence Parr <parrt at cs.usfca.edu> wrote:
> I am not sure how to do that easily, though I of
> course see the need  
> (particularly given the tree sizes you refer to
> below...yikes!).  The  
> TreeNodeStream that serializes the trees for
> two-dimensional parsing  
> could be asked to replace the node I suppose...sort
> of like asking an  
> iterator to do a delete.  Hmm...that is an

Its not so simple given singly linked list approach.
Thats why right now it takes a fair amount of kludgy
code (need to keep track of previous sibling)....
however, it also has an indispensable advantage that
trivial amount book-keeping needs to be done. Any more
pointers and it will lead to a mess.

I feel that given the way (at least of 2.x) how tree
parser works, it can sure handle this.  Basically it
would amount to a LA(1) kinda thing inserted at proper
place-

 if(_delete(LA(1))
 {
   _t.setNextSibling(LA(1).getNextSibling());
   _t = _t.getNextSibling(); // Advance
   consume(LA(1)); // discard the deleted node
 }

> interesting approach.  It  
> would only allow replacement of a node under the
> "cursor" at input.LT(1) I guess.

Umm, actually it can decide one step ahead... LA is
always >= 1. So the current element is always
'previous' and always available and 'cursor' is always
one step ahead. In LA(k) situation, it can do it as
far as k-1 (depending upon actual available tokens).

>  It would have to make sure not to try
> to parse the new  
> node.

For additions, yes. Deletions should skip deleted
nodes.

>  You might have a  single node replaced with
> two nodes:
> 
> tree grammar t;
> a : b+ ;
> b : B -> X Y ; // replace B with X Y (2 nodes)
> 
> If the input were say B B then after the first call
> to b the "X Y"  
> would replace the first B, yielding X Y B, but the
> TreeNodeStream  
> would have to do a "replace and jump over new nodes"
> so the current  
> pointer was on the 2nd B not the Y.

Fairly easy... this is what my current code does:
(Using Java5 syntax)

Vector<AST> additions = new Vector<AST>();
additions.add(X);
additions.add(Y);

for(AST node : additions)
{
  node.setNextSibling(_t.getNextSibling());
  _t.setNextSibling(node);
  _t = node; // Advance
}

> Regardless, the approach would invisibly allow a
> completely new tree  
> to be created or to have a tree modified inline. 
> Wow.

And potentially killing off some arguments advocating
writing native code visitors etc. :-) It will make
compiler writing a bit more easy as primary code
transforms can be done in *much* easier fashion, eg.
making SSA trees and removing common sub-expressions.

> > 'dupTree=true' option (patch/standard), it will
> parser
> > and either build a new tree of just the traversed
> > nodes or 'peel-off' the whole traversed tree, if
> user
> > doesn't want to write whole grammar for
> un-interesting
> > sub-trees.
> 
> I'm not sure I follow.  I normally use dot
> (wildcard) to jump over  
> uninteresting nodes/trees, though I'm not sure how
> well that will  
> work in 3.0 as the parsing algorithm is a little
> different.  I might  
> have to add another element other than dot to mean
> ignore tree  
> instead of just ignore node.

The difference is that "." will duplicate only the
subtree root node, not the whole subtree present at
"." so, if the user wants to rewrite the tree by
duplicating it (perhaps wanting to preserve the handed
tree, as in my case), he gets a 'chopped-off' tree
with subtrees missing here and there. The subtrees are
not interesting for transformations, but that might
not mean that he wanted them to be chopped off... (the
'!' (antlr 2.x) notation is there for that). 

What the 'dupTree' option does is it considers
terminals as potential root elements of uninteresting
subtrees and instead of calling 'create()' it calls
'dupTree' on them... cloning the whole
'processing-wise-uninteresting' subtree.

Practical application I'm dealing with is creating a
concrete tree from a code template that given values
of some constants at runtime, generates different code
(like an uber macro): eg

@VAR i; // Compile time loop variable
@CONST N; // User specifies value at compile time

@IF(N == 0)
int a;
a = 0;
@ELSE
int a[N];

@FOR(i = 0; i < N; i = i + 1)
   a[i] = 0;
@ENDFOR

@ENDIF

given value N=3, it will generate:

int a[3];
a[0] = 0;
a[1] = 0;
a[2] = 0;

and given N=0, it will generate:
int a;
a = 0;

So a number of nodes in parse AST get deleted (the
macro nodes) and a number of nodes get added.

> Seems like you'll have a hard time fitting 500M
> nodes in memory.  1  
> 32 bit word per node would already be 4G of RAM. ;) 
> Are you going to  
> marshall these into/out of memory like virtual
> memory?

One of our product is already hitting 3G roof (upper
1G is reserved for kernel, so unavailable on 32bit
systems). My nodes are light-weight (payload in text
field) and I plan to write/use a small objectDB for
it. However, the pain is that most DBs' performance is
a function of # of records and throwing 50M recodrs at
them is not pretty! However, after some sort of
packing and stuff, I expect to transparently implement
on-demand loading and storing of nodes thanks to
excellent architecture of ANTLR. (I have a small
prototype working)

> > Basically, what I'm looking for is a full blown
> ANTLR
> > tree-parsing stuff that whould make writing native
> > code (visitors etc) as un-necessary as can be...
> Tree
> > parser should be a full blown declarative language
> to
> > process and manipulate trees in a language
> agnostic
> > way.
> 
> That is my goal.  How much RAM does each node take
> for you?

~32 bytes, But I have kept min RAM requirements fairly
low.  Due to factories, I can change node types from
CommonAST to DBAST on the fly :-) And a nice node
creation algorithm makes caching/purging really
work... no issues on that front. 

> eeewwww! ;)  Down with anything specified in XML
> that a human has to  
> touch. ;)

Ouch. The goal is to provide a declarative query/path
language... on second thoughts, I guess existing ANTLR
syntax is powerful enough to get the job done... I'll
have to research more to come up with convinving
arguments for/against the idea.

Akhilesh

__________________________________ 
Discover Yahoo! 
Stay in touch with email, IM, photo sharing and more. Check it out! 
http://discover.yahoo.com/stayintouch.html