[antlr-interest] brief analysis of java.g's tree building in 2.x vs
proposed 3.0
Terence Parr
parrt at cs.usfca.edu
Mon Jan 31 18:24:01 PST 2005
Howdy,
The real test of any proposal is to see what it looks like in practice.
I have looked again at the java grammar. Here is some useful info.
There are about 75 parser grammar rules.
There are 27 #(...) tree construction actions. BUT, 21/27 are purely
to add an imaginary node as the root of the rule's subtree. I'm
guessing the rewrite rules will work well for this. For example, in
2.x:
modifiers
: ( modifier )*
{#modifiers = #([MODIFIERS, "MODIFIERS"], #modifiers);}
;
it becomes the following in 3.0:
modifiers
: ( modifier )* -> ^(MODIFIERS (modifier)*)
;
or more precisely
modifiers
: ( modifier )* -> ^(MODIFIERS["MODIFIERS"] (modifier)*)
;
though I hope the factor.create(int tokenType) method could ask for the
token name and figure out "MODIFIERS" automatically; i'll assume for
now it can.
Here's another 2.x java.g example:
implementsClause
: ( i:"implements"! identifier ( COMMA! identifier )* )?
{#implementsClause = #(#[IMPLEMENTS_CLAUSE,"IMPLEMENTS_CLAUSE"],
#implementsClause);}
;
In 3.0 syntax it would be perhaps:
implementsClause
: ( "implements" identifier ( COMMA identifier )* )? ->
^(IMPLEMENTS_CLAUSE (identifier)+)
;
So, out of 75 rules, 6 real tree constructions were needed, all of
which *appear* (but I could be wrong) to be covered by the new ->
syntax.
Also of note, there are about 30 "set the token type" actions such as:
packageDefinition
: p:"package"^ {#p.setType(PACKAGE_DEF);} identifier SEMI!
;
in the new 3.0 syntax, you could use a rewrite to say:
packageDefinition
: "package" identifier SEMI -> ^("package" identifier)
;
which seems way better.
Oh, I've updated the proposal page to use -> instead of => and to
address some of the concerns mentioned on the list.
Ter
PS here's a nasty rule, which shows a weakness in my current scheme
dealing with alternatives like Loring predicted I believe:
field!
: mods:modifiers
( h:ctorHead s:constructorBody // constructor
{#field = #(#[CTOR_DEF,"CTOR_DEF"], mods, h, s);}
| cd:classDefinition[#mods] // inner class
{#field = #cd;}
| id:interfaceDefinition[#mods] // inner interface
{#field = #id;}
| t:typeSpec[false] // method or variable declaration(s)
( IDENT // the name of the method
LPAREN! param:parameterDeclarationList RPAREN!
rt:declaratorBrackets[#t]
(tc:throwsClause)?
( s2:compoundStatement | SEMI )
{#field = #(#[METHOD_DEF,"METHOD_DEF"],
mods,
#(#[TYPE,"TYPE"],rt),
IDENT,
param,
tc,
s2);}
| v:variableDefinitions[#mods,#t] SEMI
{#field = #v;}
)
)
| "static" s3:compoundStatement
{#field = #(#[STATIC_INIT,"STATIC_INIT"], s3);}
| s4:compoundStatement
{#field = #(#[INSTANCE_INIT,"INSTANCE_INIT"], s4);}
;
Let me see what I'd like to do. Ok, with the modifiers left-factored
in front of that subrule, we need -> in subrules (which I have in
proposal but said we might not need...seems we do). Let's see:
field
: mods=modifiers
( ctorHead constructorBody // constructor
-> ^(CTOR_DEF modifiers ctorHead constructorBody)
| classDefinition[@mods.ast] // inner class
| interfaceDefinition[@mods.ast] // inner interface
| t:typeSpec[false] // method or variable declaration(s)
( IDENT // the name of the method
LPAREN param:parameterDeclarationList RPAREN!
declaratorBrackets[@t.ast]
(throwsClause)?
( compoundStatement | SEMI )
-> ^(METHOD_DEF
modifiers
^(TYPE declaratorBrackets)
IDENT parameterDeclarationList throwsClause
compoundStatement
)
| variableDefinitions[@mods.ast, at t.ast] SEMI
)
)
| "static" compoundStatement -> ^(STATIC_INIT compoundStatement)
| compoundStatement -> ^(INSTANCE_INIT compoundStatement)
;
I think that is more satisfying. We need -> in subrules and apparently
we need it to set the entire rule's subtree. Ok, i'll add to the
proposal.
Note that I still have to pass trees around to subrules and so on due
to factoring of the grammar. This will not affect tree construction.
For example, I pass the modifiers tree to interfaceDefinition.
Currently in 2.x it is:
interfaceDefinition![AST modifiers]
: "interface" IDENT
// it might extend some other interfaces
ie:interfaceExtends
// now parse the body of the interface (looks like a class...)
cb:classBlock
{#interfaceDefinition = #(#[INTERFACE_DEF,"INTERFACE_DEF"],
modifiers,IDENT,ie,cb);}
;
in 3.0 it would be:
interfaceDefinition[AST modifiers]
: "interface" IDENT
// it might extend some other interfaces
ie:interfaceExtends
// now parse the body of the interface (looks like a class...)
cb:classBlock
-> ^(INTERFACE_DEF @modifiers IDENT interfaceExtends classBlock)
;
Hmm...alright. Everything seems cool. The @modifiers would get linked
into the tree...hmm..something makes me uncomfortable about that. I
want only payloads not trees referenced and insert to avoid infinite
loops resulting from cyclic trees. I'll have to think about this.
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
More information about the antlr-interest
mailing list