[antlr-interest] Annotation tool best practices

Tue Apr 27 06:14:14 PDT 2004

Hi,

I have been struggling a little to understand the extent to which the
annotation tool is useful and whether a different idea is needed to
support grammar maintenance.

In a typical language processing project, I might have a lexer, parser
and about 6 core tree parsers. Ignoring the output language
perspective, there might be two flavours each of the lexer and the
parser. One of the flavour might recognize an expanded language. No
grammar inheritance between flavours.

Comparing the tree parsers to each other, the same rule often has a
different signature and return type from one grammar to the other. And
of course many of the tree parsers might define different flavours of
a particular rule.

To keep the size of this post down, I am going to restrict my
discussion to just these two types of variation:

a. Same rule different signature across grammars
------------------------------------------------

   Grammar 1:
      expression
          :    assignment_expression
               {...}
          ;

   Grammar 2:
      expression [Location loc, Context ctx, Widget wgt] returns Type
          :    retVal=assignment_expression[loc, ctx, wgt]
               {...}
          ;

b. Same rule different flavours (even in same Grammar)
------------------------------------------------------
   Grammar 1:
      dottedName
          :    IDENT
          |    #(DOT identifier IDENT)
          ;

   Grammar 1:
      dottedNameAsString returns String 
          :    IDENT  {...}
          |    #(DOT tmp=identifierAsString IDENT)
          	{...}
          ;

   Grammar 1:
      dottedNameAsElement returns ModelElement
          :    IDENT  {...}
          |    #(DOT tmp=identifierAsElement IDENT)
          	{...}
          ;

Using the tool
==============

Using the annotation tool to deal with (a) seems to require something
like the following:

  grammar.tg
  ----------
      expression <r_expr_sig>
      	<r_expr_init>
          :    <r_expr_rspec>assignment_expression<r_expr_args>
               <r_expr_action>
          ;

  grammar.ins
  -----------
      @r_expr_sig     : [Location loc, Context ctx, Widget wgt]
returns Type
      @r_expr_init    : { //declare local var 'retVal'
                        }
      @r_expr_rspec   : retVal=
      @r_expr_args    : [loc, ctx, wgt]
      @r_expr_action  : {...}

Having to specify so much in an auxiliary file seems a little much
but, <r_expr_sig>, <r_expr_init>, <r_expr_action> would have to be
done to be able to reuse the grammar across languages anyway. So
perhaps the overhead is acceptable.

How about (b). Let's assume that the reason the grammar needs three
flavours of the basic 'dottedName' rule is because we replace some
occurrences of 'dottedName' with the 'dottedNameAsString' and
'dottedNameAsElement' flavours that do more work and need the context
of the call to do that work. Dealing with (b) seems to require
something like this:

  grammar.tg
  ----------
      class <r_class_sig>
      	<r_class_init>
          :    #( CLASS <r_class_rspec><r_class_rule><r_class_args>
LPAREN 
                   ...
                  RPAREN
                  <r_class_action>
               )
          ;

  walker1.ins - no signature, calls dottedName
  -----------
      @r_class_init    : { //init stuff
                        }
      @r_class_rule    : dottedName
      @r_class_action  : {...}

  walker2.ins - has signature, calls dottedNameAsElement
  -----------
      @r_class_sig    : [Location loc, Context ctx] returns ModelElement
      @r_class_init    : { //declare local var 'retVal'
                        }
      @r_class_rspec   : retVal=
      @r_class_rule    : dottedNameAsElement
      @r_class_args    : [loc, ctx]
      @r_class_action  : {...}

Here, we've had to move the rule name from the template grammar to the
insert file. Note that this might be required for many rules and, that
a single rule with multiple occurrences of "dottedName" might be
linked to multiple flavours via similar inserts. This seems a little
"wrong" to me somehow. I'd always imagine that as long as the *.tg
file contained the grammar and just additional tags for the insert
points, it wasn't too bad a trade-off. I get to view a cleaner, bare
bones grammar and with a good naming scheme I can even "read into the
insert file" just from the template grammar.

Heck, maybe I'm missing some insight here. How do you solve such
issues in your projects?. Do you use the annotation tool or another
tool?. Perhaps you have a very different solution based on entirely
different concepts from the text replacement preprocessing at work
here. Enquiring minds would like to know....

Cheers,

Micheal
ANTLR/C#

PS	[I would be grateful for any references to design discussion or
implementations of editors that can handle virtual documents composed
from a number of other documents like the template and insert files
above. Looking at my solution for (b) above, I would like to be able
to view and edit directly the walker1.g and walker2.g grammars. The
editor should hide the fact that they are composites of grammar.tg and
walker1.ins and walker2.ins respectively. Perhaps multiple *.tg and
*.ins files can be involved in a single grammar...]

PPS	[For those unaware of it's existence, the Annotation tool is
designed to help you maintain language- and task-specific actions code
separately from an [hopefully clearer] ANTLR grammar. It was written
by Bogdan Mitu and is available in the Yahoo Files area. A newer
version is also available from the file sharing area on the web site.]

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/