[antlr-interest] ANTLR 3.0 tree construction proposal
Micheal J
open.zone at virgin.net
Mon Jan 31 16:13:53 PST 2005
> After a lot of typing (and I mean a lot), you'll see my ANTLR
> 3.0 tree
> construction proposal. After the proposal, you'll see my long stream
> of consciousness as I wander through the design process (you can
> probably ignore that part).
>
http://www.antlr.org/blog/antlr3/trees.tml
First thoughts that popped into my head:
1 Why mess with existing syntax at all?
---------------------------------------
By this I mean, while I recognized the benefits of ^^, I'm also rather
concerned that existing grammars would need to be rewritten and it wouldn't
be doable in any automatic fashion.
2 Rewrite rules are sexy!
---------------------------------------
I like the idea of rewrite rules. So much so in fact that I think they
should be orthogonal to the existing inline notation used in ANTLR 2.x
(mixing them [in a rule?] should be prevented). And they needn't be cryptic.
How about these rewrites (no pun intended <chuckle>) of grammar fragments in
your blog:
decl
: "var" (ID ':' type ';')+
$rewrite_rules
{
^("var" ^(':' ID type)+)
}
expr
: left=mul_expr PLUS right=mul_expr
$rewrite_rules
{
$condition(@right.type==INT && Integer.parseInt(@right.text)==0 &&
@left.type==INT && Integer.parseInt(@left.text)==0)
-> $empty
$condition(right.type==INT && Integer.parseInt(@right.text)==0)
-> left
$condition(left.type==INT && Integer.parseInt(@left.text)==0)
-> right
$default -> ^(PLUS left right) // default case
// Alternatively, we can forego the $default keywords (as in not
support it at all) thus:
// ^(PLUS left right) // default case
}
;
3. Separate Token and AST node class declaration sections
---------------------------------------------------------
We've gone over this before but the TreeDL approach (it is meant as just a
label, I know it wasn't invented/pioneered by TreeDL) of declaring node
types should be adopted. Apart from the obvious benefits that we've
discussed previously, it allows one to be even more language-agnostic. Take
the 'expr' rule above. Occurences of "Integer.parseInt()" are just plain
ugly from a langauge-neutral grammar perspective.
We could have something like this instead (based on ideas expressed in
http://www.antlr.org/pipermail/antlr-interest/2004-November/010027.html
):
grammar P;
AST
{
abstract node Expression
{
}
node BinaryExpression : Expression
{
child left : Expression;
child right : Expression;
attribute lexeme : String;
accessor value : int; // will generate:
// -- getValue() in
C++/Java_1.4-
// -- readonly Value
property in C#/Java_1.5+
// User needs to provide the
implementation.
//
// [Perhaps ANTLR generates
uncompilable code in the
// body and a copy of the
comment supplied with the
// declaration. This might
read:
// "Returns the integer
value of the lexeme."
// ]
// @see-also 'mutator' -
for setXX/XX
// @see-also 'property' -
for getXX/XX and setXX/XX
}
}
tokens
{
PLUS<AST=BinaryExpression>
MINUS<AST=BinaryExpression>
}
expr
: left=mul_expr PLUS right=mul_expr
$rewrite_rules
{
$condition(@right.type==INT && @right.value==0 &&
@left.type==INT && @left.value==0)
-> $empty
$condition(right.type==INT && @right.value==0)
-> left
$condition(left.type==INT && @left.value==0)
-> right
$default -> ^(PLUS left right) // default case
}
;
Cheers,
Micheal
ANTLR/C#
More information about the antlr-interest
mailing list