[antlr-interest] CommonTree & Tree grammar versus DIY

Wed Aug 20 19:42:26 PDT 2008

At 12:30 PM 8/16/2008, Terence Parr wrote:

>On Aug 16, 2008, at 12:56 AM, Gerald Rosenberg wrote:
>>Even with so much existing, this is not a trivial extension to
>>Antlr.  Theoretically, the end result of any "doSomething()" AST
>>manipulation could be done with a well-planned set of tree-walkers.
>>It is just that a random access approach, supported by the
>>equivalent of findFirst, findNext, findPrev, findLast operations, is
>>distinctly better than a purely top-down (only getNext allowed)
>>approach for non-trivial bi-directionally context-dependent AST
>>rewrite problems.
>>
>>Unfortunately, does not sound like AntlrMorph addresses this
>>particular problem.
>
>Not sure I understand the specific problem; can you rephrase?

The problem is the complexity of performing ad-hoc structural 
modifications on an existing AST.  For example, where the AST is the 
model in a WYSIWYG editor.  Given a high-level command to make a 
modification, you need to examine the surrounding context/portions of 
the AST to determine if it can be made, how and precisely where to 
make it, and what other modifications are implicitly required in 
order to preserve the integrity of the model represented by the AST.

Hand writing AST traversal/modification code is tedious and 
fragile.  Tree grammars very nicely abstract traversal and rewrite, 
but are oriented exclusively to top-down, phased rewrites using 
out-of-band symbol tables to establish state.  And the node rewrite 
is essentially statically defined by the tree grammar.  Very 
appropriate for compiler implementation.

For ad-hoc AST changes, the better approach, at least conceptually, 
is to implement a low-level structural modification API  with methods 
to "find" a node based on parameter values, and to similarly create, 
copy, insert and delete nodes.  Eclipse, for example, has several 
abstraction layers to enable ad-hoc Java AST modification; the lowest 
though is basically find, create, copy, insert and delete and the 
higher levels implement procedural logic to evaluate context and 
enforce model integrity.  Perhaps the greatest impediment to 
supporting other languages in Eclipse is the required hand coding of 
these layers.

Antlr could directly generate at least the low-level API.  For 
example, consider an AST that is the underlying data structure for an 
HTML editor.  A grammar to generate the desired API  might be specified as:

         access grammar html;

         start_tag : open_tag ID ^( name ^( attr )* )*
                 => find (int start_node, boolean direction, String 
$ID.text ) returns [int node_index]
                 => find (int start_node, boolean direction, String 
$ID.text, String name, String attr ) returns [int node_index]
                 => create (String $ID.text, String name, List attr ) 
returns [$start_tag.tree]
                 => copy (int node_index) returns [$start_tag.tree]
                 => insert (int node_index, $start_tag.tree) returns 
[boolean status]
                 => delete (int node_index) returns [$start_tag.tree]
         ;

This is not far off from a tree grammar: tersely abstracted, but 
still providing sufficient information to unambiguously define 
implementation of the API.  The generated code will be no more 
fragile than that produced from a tree grammar.  Add in heterogeneous 
tree node support and it is a rather complete solution.  Non-trivial, 
but complete.  The devil is in figuring out the appropriate grammar 
syntax for defining the API productions -- what is shown is good for 
discussion, but probably not much more.