[antlr-interest] CommonTree & Tree grammar versus DIY

Thu Aug 21 14:04:56 PDT 2008

At 11:07 AM 8/21/2008, Terence Parr wrote:

>On Aug 20, 2008, at 7:42 PM, Gerald Rosenberg wrote:
>
>>Antlr could directly generate at least the low-level API.  For
>>example, consider an AST that is the underlying data structure for
>>an HTML editor.  A grammar to generate the desired API  might be
>>specified as:
>>
>>        access grammar html;
>>
>>        start_tag : open_tag ID ^( name ^( attr )* )*
>>                => find (int start_node, boolean direction, 
>> String  $ID.text ) returns [int node_index]
>>                => find (int start_node, boolean direction, 
>> String  $ID.text, String name, String attr ) returns [int node_index]
>>                => create (String $ID.text, String name, List attr 
>> ) returns [$start_tag.tree]
>>                => copy (int node_index) returns [$start_tag.tree]
>>                => insert (int node_index, $start_tag.tree) 
>> returns  [boolean status]
>>                => delete (int node_index) returns [$start_tag.tree]
>>        ;
>>
>>This is not far off from a tree grammar: tersely abstracted, but
>>still providing sufficient information to unambiguously define
>>implementation of the API.  The generated code will be no more
>>fragile than that produced from a tree grammar.  Add in
>>heterogeneous tree node support and it is a rather complete
>>solution.  Non-trivial, but complete.  The devil is in figuring out
>>the appropriate grammar syntax for defining the API productions --
>>what is shown is good for discussion, but probably not much more.
>
>So, ANTLR's job would be to fill in those find/create/... methods?

Exactly.

>I'm not sure he could figure that out from the argument list.

The necessary information content is there. For example, consider the 
equivalence of:

>>        access grammar html;
>>
>>        start_tag : open_tag ID ^( name ^( attr )* )*
>>                => find (int start_node, boolean direction, String 
>> $ID.text, , String name, String attr ) returns [int node_index] ;

with:

         tree grammar html;

         start_tag : open_tag { if (direction && 
$open_tag.node_index > start_node) else if (!direction && 
$open_tag.node_index <= start_node)  }?
                         ID { $ID.text.equals("someIDString") }?
                         ^( n=name ^( a=attr { 
$n.equals("someNameString") && $a.equals("someAttrString") }? )+ )+
                 -> { return $open_tag.node_index } ;

Likewise, you could emulate the remaining functionality of the access 
grammar with a set of tree grammars; separate grammars would be 
needed for each node type and API operation.  (The tree grammar 
syntax, as used in this manner, is messy/noisy and the complex of 
tree walkers produced would be clumsy to orchestrate -- better to 
have a clean, purpose defined grammar syntax that directly produces a 
conventional-looking API.)

So, to answer your concern, the given structure of the node is 
sufficient to define the scope/nesting of where the elements of the 
argument list need to be tested.  It is implicitly being done in 
standard tree rewrites -- basically the same as figuring out where to 
put the TYPE and DEF:

         tree grammar html2;

         start_tag : open_tag ID ^( name ^( attr )* )*
                 -> open_tag ID TYPE ^( name ^( DEF attr )* )* ;

The production grammar syntax needs to be better designed to make the 
intent of the access grammar more explicit -- as previously noted, 
the syntax shown is good for discussion, but probably not much more.