[antlr-interest] Best place to modify AST?

Fri Jul 6 13:47:33 PDT 2007

On Friday 06 July 2007 13:16, Cameron Esfahani wrote:
> My JSON-derived grammar supports assigning JSON entities to
> variables.  These variables can then be used in other JSON entities
> to refer to the original entities:
>
> foo = { "tag" : "value" }
>
> { "child" : foo }

That's a very interesting idea. It makes me wonder if you've got some 
client-side (i.e., browser-based) way of exploiting these extended JSON 
structures.

> As far as any consumer of the final tree is concerned, they shouldn't
> see this.  When they iterate through the tree, I want it to appear,
> for all intents and purposes, that the above input looks like:
>
> { "child" : { "tag" : "value" }

Do you want to do that by replicating the right-hand-side of the 
variable at the point of reference, or by some kind of structure 
sharing pointer? See below for more on these alternatives.

> I'm trying to decide the best place to handle this.  My initial idea
> is to have two separate tree walker stages, the first would resolve
> any of these references and update the AST, and the second would
> process this updated tree.

A two-pass approach may not be necessary or even the best way.

> Is this an appropriate route to take?  Can a tree walker change the
> underlying as it's, uh, walking it?

Keep in mind that the "tree" you're parsing in an ANTLR tree grammar is 
a linear / sequential encoding of a tree _traversal_, not the tree as 
such. (A distinction without a difference you say? Perhaps.) The order 
of that sequence is that which would result from a depth-first, 
left-to-right traversal of the AST so encoded. What would ordinarily be 
parent -> child links are special "down" tokens inserted by the 
tree-building parser. Where the traversal would ascend from the last 
sibling to its parent, there's an "up" token inserted.

As I understand it, this encoding is what allows conventional parsing 
techniques to be readily applied to tree parsing. (And if I might say 
so, it's brilliant).

So what I'd probably do is build the JSON tree structure (a real tree 
based on your JSON object model) in the tree parser actions with the 
variable bits handled by:

a) Building a symbol table that accumulates the right-hand-side values 
(object references to the point in the JSON object tree where the 
variable is defined)

b) Putting special variable-reference nodes into the JSON object tree at 
the appropriate point.

When the variable reference you encounter is forward, you'll create a 
symbol table entry that awaits having its value filled in. A final pass 
over the symbol table would be needed to make sure no unresolved 
references were present.

If you want to avoid having to walk the JSON object tree, you could keep 
a list of all variable-reference nodes in the symbol table entries. 
Then you can drive the replacement of variable references with their 
actual content off a scan of the symbol table itself, which is perhaps 
easier than walking the JSON object tree looking for the 
variable-reference nodes to be replaced. Perhaps.

If you want to do structure sharing, you don't even need a second pass 
of any sort, though you'd need to verify no unfilled / unresolved 
variable references were left over. This would require a special 
delegating JSON node type (or set of types, corresponding to the basic 
JSON value types).

The structure sharing approach may also require a way to replicate the 
shared value when it's modified (copy-on-modify), either via a variable 
or at its original point of definition.

All in all, I'd probably not try the structure sharing approach unless 
there were some compelling reasons to do so.

You'll also need to guard against self-reference (i.e., a variable 
reference to a definition that dominates that reference in the JSON 
value tree).

> Cameron Esfahani
> dirty at apple.com

Apple, eh? What are you guys up to, now?? Do you know a Chris Volkert? 
If you do, tell him Randy says "hi."

Randall Schulz