[antlr-interest] brief analysis of java.g's tree building in 2.x vs proposed 3.0

Terence Parr parrt at cs.usfca.edu
Mon Jan 31 18:24:01 PST 2005


Howdy,

The real test of any proposal is to see what it looks like in practice. 
  I have looked again at the java grammar.  Here is some useful info.

There are about 75 parser grammar rules.

There are 27 #(...) tree construction actions.  BUT, 21/27 are purely 
to add an imaginary node as the root of the rule's subtree.  I'm 
guessing the rewrite rules will work well for this.  For example, in 
2.x:

modifiers
     :   ( modifier )*
         {#modifiers = #([MODIFIERS, "MODIFIERS"], #modifiers);}
     ;

it becomes the following in 3.0:

modifiers
     :   ( modifier )* -> ^(MODIFIERS (modifier)*)
     ;

or more precisely

modifiers
     :   ( modifier )* -> ^(MODIFIERS["MODIFIERS"] (modifier)*)
     ;

though I hope the factor.create(int tokenType) method could ask for the 
token name and figure out "MODIFIERS" automatically; i'll assume for 
now it can.

Here's another 2.x java.g example:

implementsClause
     :   (  i:"implements"! identifier ( COMMA! identifier )* )?
         {#implementsClause = #(#[IMPLEMENTS_CLAUSE,"IMPLEMENTS_CLAUSE"],
                                  #implementsClause);}
     ;

In 3.0 syntax it would be perhaps:

implementsClause
     :   ( "implements" identifier ( COMMA identifier )* )? -> 
^(IMPLEMENTS_CLAUSE (identifier)+)
     ;

So, out of 75 rules, 6 real tree constructions were needed, all of 
which *appear* (but I could be wrong) to be covered by the new -> 
syntax.

Also of note, there are about 30 "set the token type" actions such as:

packageDefinition
         :       p:"package"^ {#p.setType(PACKAGE_DEF);} identifier SEMI!
         ;

in the new 3.0 syntax, you could use a rewrite to say:

packageDefinition
         :       "package" identifier SEMI -> ^("package" identifier)
         ;

which seems way better.

Oh, I've updated the proposal page to use -> instead of => and to 
address some of the concerns mentioned on the list.

Ter
PS	here's a nasty rule, which shows a weakness in my current scheme 
dealing with alternatives like Loring predicted I believe:

field!
     :   mods:modifiers
         (   h:ctorHead s:constructorBody // constructor
             {#field = #(#[CTOR_DEF,"CTOR_DEF"], mods, h, s);}

         |   cd:classDefinition[#mods]       // inner class
             {#field = #cd;}

         |   id:interfaceDefinition[#mods]   // inner interface
             {#field = #id;}

         |   t:typeSpec[false]  // method or variable declaration(s)
             (   IDENT  // the name of the method

                 LPAREN! param:parameterDeclarationList RPAREN!

                 rt:declaratorBrackets[#t]

                 (tc:throwsClause)?

                 ( s2:compoundStatement | SEMI )
                 {#field = #(#[METHOD_DEF,"METHOD_DEF"],
                              mods,
                              #(#[TYPE,"TYPE"],rt),
                              IDENT,
                              param,
                              tc,
                              s2);}
             |   v:variableDefinitions[#mods,#t] SEMI
                 {#field = #v;}
             )
         )

     |   "static" s3:compoundStatement
         {#field = #(#[STATIC_INIT,"STATIC_INIT"], s3);}

     |   s4:compoundStatement
         {#field = #(#[INSTANCE_INIT,"INSTANCE_INIT"], s4);}
     ;

Let me see what I'd like to do.  Ok, with the modifiers left-factored 
in front of that subrule, we need -> in subrules (which I have in 
proposal but said we might not need...seems we do).  Let's see:

field
     :   mods=modifiers
         (   ctorHead constructorBody // constructor
             -> ^(CTOR_DEF modifiers ctorHead constructorBody)

         |   classDefinition[@mods.ast]       // inner class

         |   interfaceDefinition[@mods.ast]   // inner interface

         |   t:typeSpec[false]  // method or variable declaration(s)
             (   IDENT  // the name of the method
                 LPAREN param:parameterDeclarationList RPAREN!
                 declaratorBrackets[@t.ast]
                 (throwsClause)?
                 ( compoundStatement | SEMI )
                 -> ^(METHOD_DEF
                            modifiers
                            ^(TYPE declaratorBrackets)
                            IDENT parameterDeclarationList throwsClause 
compoundStatement
                        )

             |   variableDefinitions[@mods.ast, at t.ast] SEMI
             )
         )

     |   "static" compoundStatement -> ^(STATIC_INIT compoundStatement)

     |   compoundStatement -> ^(INSTANCE_INIT compoundStatement)
     ;

I think that is more satisfying.  We need -> in subrules and apparently 
we need it to set the entire rule's subtree.  Ok, i'll add to the 
proposal.

Note that I still have to pass trees around to subrules and so on due 
to factoring of the grammar.  This will not affect tree construction.  
For example, I pass the modifiers tree to interfaceDefinition.  
Currently in 2.x it is:

interfaceDefinition![AST modifiers]
     :   "interface" IDENT
         // it might extend some other interfaces
         ie:interfaceExtends
         // now parse the body of the interface (looks like a class...)
         cb:classBlock
         {#interfaceDefinition = #(#[INTERFACE_DEF,"INTERFACE_DEF"],
                                     modifiers,IDENT,ie,cb);}
     ;

in 3.0 it would be:

interfaceDefinition[AST modifiers]
     :   "interface" IDENT
         // it might extend some other interfaces
         ie:interfaceExtends
         // now parse the body of the interface (looks like a class...)
         cb:classBlock
         -> ^(INTERFACE_DEF @modifiers IDENT interfaceExtends classBlock)
     ;

Hmm...alright.  Everything seems cool.  The @modifiers would get linked 
into the tree...hmm..something makes me uncomfortable about that.  I 
want only payloads not trees referenced and insert to avoid infinite 
loops resulting from cyclic trees.  I'll have to think about this.
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com





More information about the antlr-interest mailing list