[antlr-interest] ANTLR 3.0 tree construction proposal

Mon Jan 31 16:13:53 PST 2005

> After a lot of typing (and I mean a lot), you'll see my ANTLR 
> 3.0 tree 
> construction proposal.  After the proposal, you'll see my long stream 
> of consciousness as I wander through the design process (you can 
> probably ignore that part).
> 
http://www.antlr.org/blog/antlr3/trees.tml

First thoughts that popped into my head:

1 Why mess with existing syntax at all?
---------------------------------------

By this I mean, while I recognized the benefits of ^^, I'm also rather
concerned that existing grammars would need to be rewritten and it wouldn't
be doable in any automatic fashion.

2 Rewrite rules are sexy!
---------------------------------------

I like the idea of rewrite rules. So much so in fact that I think they
should be orthogonal to the existing inline notation used in ANTLR 2.x
(mixing them [in a rule?] should be prevented). And they needn't be cryptic.
How about these rewrites (no pun intended <chuckle>) of grammar fragments in
your blog:

decl 
 : "var" (ID ':' type ';')+ 
   $rewrite_rules
   {
   	^("var" ^(':' ID type)+) 
   }

expr 
 : left=mul_expr PLUS right=mul_expr
   $rewrite_rules
   {
   	$condition(@right.type==INT && Integer.parseInt(@right.text)==0 &&
            @left.type==INT && Integer.parseInt(@left.text)==0)
        -> $empty

        $condition(right.type==INT && Integer.parseInt(@right.text)==0) 
        -> left

        $condition(left.type==INT && Integer.parseInt(@left.text)==0)   
        -> right

        $default -> ^(PLUS left right) // default case   

        // Alternatively, we can forego the $default keywords (as in not
support it at all) thus:
        // ^(PLUS left right) // default case   
   }
 ;

3. Separate Token and AST node class declaration sections
---------------------------------------------------------

We've gone over this before but the TreeDL approach (it is meant as just a
label, I know it wasn't invented/pioneered by TreeDL) of declaring node
types should be adopted. Apart from the obvious benefits that we've
discussed previously, it allows one to be even more language-agnostic. Take
the 'expr' rule above. Occurences of "Integer.parseInt()" are just plain
ugly from a langauge-neutral grammar perspective. 

We could have something like this instead (based on ideas expressed in
http://www.antlr.org/pipermail/antlr-interest/2004-November/010027.html
):

grammar P;

AST
{
    abstract node Expression
    {
    }

    node BinaryExpression : Expression
    {
        child     left   : Expression;
        child     right  : Expression;
	  attribute lexeme : String;	  
	  accessor  value  : int;  	// will generate:
						//   -- getValue() in
C++/Java_1.4-
						//   -- readonly Value
property in C#/Java_1.5+
						// User needs to provide the
implementation.
						//
					  	// [Perhaps ANTLR generates
uncompilable code in the
						//  body and a copy of the
comment supplied with the 
						//  declaration. This might
read:
						//     "Returns the integer
value of the lexeme."
						// ]
						// @see-also 'mutator'  -
for setXX/XX
						// @see-also 'property' -
for getXX/XX and setXX/XX
    }
}

tokens
{
   PLUS<AST=BinaryExpression>
   MINUS<AST=BinaryExpression>
}

expr 
 : left=mul_expr PLUS right=mul_expr
   $rewrite_rules
   {
   	$condition(@right.type==INT && @right.value==0 &&
            @left.type==INT && @left.value==0)
        -> $empty

        $condition(right.type==INT && @right.value==0) 
        -> left

        $condition(left.type==INT && @left.value==0)   
        -> right

        $default -> ^(PLUS left right) // default case   
   }
 ;

Cheers,

Micheal
ANTLR/C#