[antlr-interest] uh oh, thought of a good reason not to rewrite
Terence Parr
parrt at cs.usfca.edu
Mon Jun 27 12:48:27 PDT 2005
The problem with rewriting a tree is that you are modifying the tree
you are trying to parse. This is ok, if you do only local rewrites
but what if you use an action to alter nodes in the tree that will be
parsed later? You could easily create an infinite loop where a rule
creates nodes that are parsed later via that same rule which creates
more nodes in the future ad naseum. Ugh.
One easy way to solve this is of course with memory. The
TreeNodeStream object could build up an entire list of the nodes a
priori, before the parse begins. In this way, the parse will proceed
exactly as you expect even if you unlink / rewrite every node in the
tree. But the cost is 4 bytes ptr per node and with 500,000 nodes as
the example somebody gave, that's already 2G of RAM. :( This could
be mapped to the disk like virtual memory, but...
Of course, creating a new tree of 500k nodes is even more expensive.
This "alter the future" is more common than you think. What if
you're generating code for a translator to C from Java and you create
memory in some statement of a block. As you do that you may want to
generate a free() call at the end of the block. When the parser
finishes the last "real" statement, it will start to parse your free
() calls! The problem is that those are C statements not Java! The
parser will puke.
The choice appears:
1. don't build a tree and you can use the efficient TreeNodeStream
that doesn't buffer
2. DUP: build a new tree from the old tree; you can use the efficient
TreeNodeStream that doesn't buffer
3. REWRITE: TreeNodeStream buffers the entire sequence of nodes to
visit in the tree before parsing begins and nodes are rewritten not
dup'd.
Come to think of it, there could be some really strange bugs
introduced if you rewrite a subtree that is later parsed using it's
original structure (but it no longer has that stronger and some of
your actions will fail!). Argh!
Anybody wanna comment? This sounds complicated enough that I should
release the tree parser component and worry about tree->tree
manipulation for the next early access release.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list