[antlr-interest] ANTLR 3.0 tree construction proposal

Micheal J open.zone at virgin.net
Mon Jan 31 16:13:53 PST 2005

> After a lot of typing (and I mean a lot), you'll see my ANTLR 
> 3.0 tree 
> construction proposal.  After the proposal, you'll see my long stream 
> of consciousness as I wander through the design process (you can 
> probably ignore that part).

First thoughts that popped into my head:

1 Why mess with existing syntax at all?

By this I mean, while I recognized the benefits of ^^, I'm also rather
concerned that existing grammars would need to be rewritten and it wouldn't
be doable in any automatic fashion.

2 Rewrite rules are sexy!

I like the idea of rewrite rules. So much so in fact that I think they
should be orthogonal to the existing inline notation used in ANTLR 2.x
(mixing them [in a rule?] should be prevented). And they needn't be cryptic.
How about these rewrites (no pun intended <chuckle>) of grammar fragments in
your blog:

 : "var" (ID ':' type ';')+ 
   	^("var" ^(':' ID type)+) 

 : left=mul_expr PLUS right=mul_expr
   	$condition(@right.type==INT && Integer.parseInt(@right.text)==0 &&
            @left.type==INT && Integer.parseInt(@left.text)==0)
        -> $empty
        $condition(right.type==INT && Integer.parseInt(@right.text)==0) 
        -> left
        $condition(left.type==INT && Integer.parseInt(@left.text)==0)   
        -> right
        $default -> ^(PLUS left right) // default case   
        // Alternatively, we can forego the $default keywords (as in not
support it at all) thus:
        // ^(PLUS left right) // default case   

3. Separate Token and AST node class declaration sections

We've gone over this before but the TreeDL approach (it is meant as just a
label, I know it wasn't invented/pioneered by TreeDL) of declaring node
types should be adopted. Apart from the obvious benefits that we've
discussed previously, it allows one to be even more language-agnostic. Take
the 'expr' rule above. Occurences of "Integer.parseInt()" are just plain
ugly from a langauge-neutral grammar perspective. 

We could have something like this instead (based on ideas expressed in

grammar P;

    abstract node Expression

    node BinaryExpression : Expression
        child     left   : Expression;
        child     right  : Expression;
	  attribute lexeme : String;	  
	  accessor  value  : int;  	// will generate:
						//   -- getValue() in
						//   -- readonly Value
property in C#/Java_1.5+
						// User needs to provide the
					  	// [Perhaps ANTLR generates
uncompilable code in the
						//  body and a copy of the
comment supplied with the 
						//  declaration. This might
						//     "Returns the integer
value of the lexeme."
						// ]
						// @see-also 'mutator'  -
for setXX/XX
						// @see-also 'property' -
for getXX/XX and setXX/XX


 : left=mul_expr PLUS right=mul_expr
   	$condition(@right.type==INT && @right.value==0 &&
            @left.type==INT && @left.value==0)
        -> $empty
        $condition(right.type==INT && @right.value==0) 
        -> left
        $condition(left.type==INT && @left.value==0)   
        -> right
        $default -> ^(PLUS left right) // default case   



More information about the antlr-interest mailing list