[antlr-interest] CommonTree & Tree grammar versus DIY
Gerald Rosenberg
gerald at certiv.net
Wed Aug 20 19:42:26 PDT 2008
At 12:30 PM 8/16/2008, Terence Parr wrote:
>On Aug 16, 2008, at 12:56 AM, Gerald Rosenberg wrote:
>>Even with so much existing, this is not a trivial extension to
>>Antlr. Theoretically, the end result of any "doSomething()" AST
>>manipulation could be done with a well-planned set of tree-walkers.
>>It is just that a random access approach, supported by the
>>equivalent of findFirst, findNext, findPrev, findLast operations, is
>>distinctly better than a purely top-down (only getNext allowed)
>>approach for non-trivial bi-directionally context-dependent AST
>>rewrite problems.
>>
>>Unfortunately, does not sound like AntlrMorph addresses this
>>particular problem.
>
>Not sure I understand the specific problem; can you rephrase?
The problem is the complexity of performing ad-hoc structural
modifications on an existing AST. For example, where the AST is the
model in a WYSIWYG editor. Given a high-level command to make a
modification, you need to examine the surrounding context/portions of
the AST to determine if it can be made, how and precisely where to
make it, and what other modifications are implicitly required in
order to preserve the integrity of the model represented by the AST.
Hand writing AST traversal/modification code is tedious and
fragile. Tree grammars very nicely abstract traversal and rewrite,
but are oriented exclusively to top-down, phased rewrites using
out-of-band symbol tables to establish state. And the node rewrite
is essentially statically defined by the tree grammar. Very
appropriate for compiler implementation.
For ad-hoc AST changes, the better approach, at least conceptually,
is to implement a low-level structural modification API with methods
to "find" a node based on parameter values, and to similarly create,
copy, insert and delete nodes. Eclipse, for example, has several
abstraction layers to enable ad-hoc Java AST modification; the lowest
though is basically find, create, copy, insert and delete and the
higher levels implement procedural logic to evaluate context and
enforce model integrity. Perhaps the greatest impediment to
supporting other languages in Eclipse is the required hand coding of
these layers.
Antlr could directly generate at least the low-level API. For
example, consider an AST that is the underlying data structure for an
HTML editor. A grammar to generate the desired API might be specified as:
access grammar html;
start_tag : open_tag ID ^( name ^( attr )* )*
=> find (int start_node, boolean direction, String
$ID.text ) returns [int node_index]
=> find (int start_node, boolean direction, String
$ID.text, String name, String attr ) returns [int node_index]
=> create (String $ID.text, String name, List attr )
returns [$start_tag.tree]
=> copy (int node_index) returns [$start_tag.tree]
=> insert (int node_index, $start_tag.tree) returns
[boolean status]
=> delete (int node_index) returns [$start_tag.tree]
;
This is not far off from a tree grammar: tersely abstracted, but
still providing sufficient information to unambiguously define
implementation of the API. The generated code will be no more
fragile than that produced from a tree grammar. Add in heterogeneous
tree node support and it is a rather complete solution. Non-trivial,
but complete. The devil is in figuring out the appropriate grammar
syntax for defining the API productions -- what is shown is good for
discussion, but probably not much more.
More information about the antlr-interest
mailing list