[antlr-interest] uh oh, thought of a good reason not to rewrite

Terence Parr parrt at cs.usfca.edu
Mon Jun 27 12:48:27 PDT 2005


The problem with rewriting a tree is that you are modifying the tree  
you are trying to parse.  This is ok, if you do only local rewrites  
but what if you use an action to alter nodes in the tree that will be  
parsed later?  You could easily create an infinite loop where a rule  
creates nodes that are parsed later via that same rule which creates  
more nodes in the future ad naseum.  Ugh.

One easy way to solve this is of course with memory.  The  
TreeNodeStream object could build up an entire list of the nodes a  
priori, before the parse begins.  In this way, the parse will proceed  
exactly as you expect even if you unlink / rewrite every node in the  
tree.  But the cost is 4 bytes ptr per node and with 500,000 nodes as  
the example somebody gave, that's already 2G of RAM. :(  This could  
be mapped to the disk like virtual memory, but...

Of course, creating a new tree of 500k nodes is even more expensive.

This "alter the future" is more common than you think.  What if  
you're generating code for a translator to C from Java and you create  
memory in some statement of a block.  As you do that you may want to  
generate a free() call at the end of the block.  When the parser  
finishes the last "real" statement, it will start to parse your free 
() calls!  The problem is that those are C statements not Java!  The  
parser will puke.

The choice appears:

1. don't build a tree and you can use the efficient TreeNodeStream  
that doesn't buffer
2. DUP: build a new tree from the old tree; you can use the efficient  
TreeNodeStream that doesn't buffer
3. REWRITE: TreeNodeStream buffers the entire sequence of nodes to  
visit in the tree before parsing begins and nodes are rewritten not  
dup'd.

Come to think of it, there could be some really strange bugs  
introduced if you rewrite a subtree that is later parsed using it's  
original structure (but it no longer has that stronger and some of  
your actions will fail!).  Argh!

Anybody wanna comment?  This sounds complicated enough that I should  
release the tree parser component and worry about tree->tree  
manipulation for the next early access release.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com



More information about the antlr-interest mailing list