[antlr-interest] mild simplification and tree grammars

Tue Apr 20 04:51:36 PDT 2010

Well, after your good advices I could get a bit further in my grammar.

Now I have a general question and a specific one about tree grammars.

I would like to attempt a kind of mild simplification of an arithmetic and
boolean expression.

I see that a lot of interesting work can be done with a replace grammar like
the following:

s   :	^(UMINUS r=NUMBER) { shc = true; }
		-> NUMBER[num($r).negate().toString()]
    |	^(NOT FALSE) { shc = true; }
		-> TRUE		
    |	^(NOT TRUE) { shc = true; }
		-> FALSE	
    |	^(PLUS l=NUMBER r=NUMBER) { shc = true; } 
		-> NUMBER[num($l).add(num($r)).toString()]
    |	^(PLUS l=NUMBER {num($l) == BigDecimal.ZERO}? r1=.) { shc = true; }
		-> $r1
    |	^(PLUS l1=. r=NUMBER {num($r) == BigDecimal.ZERO}?) { shc = true; }
		-> $l1
...

Where 'shc' stands for "Something Has Changed" and is going to be used to
decide if a further execution of the simplifier grammar is or isn't
required. num(CommonTree) is a shorthand for "new
BigDecimal(node.getText())".

The general question is: is there a simply way to simplify things like:

	1 + a + 2

?

Of course I get something like (PLUS (PLUS NUMBER IDENT) NUMBER) and I see
it could be simplified as (PLUS (PLUS NUMBER NUMBER) IDENT). I may guess
some that constructs to obtain this are possible, but they seems to me to
increase the complexity of the grammar a lot and I don't know if I'm going
to swim in too deep waters. I mean, is this the way compilers do simplifies
expression, or instead there is something well-unknown to my knowledge? Any
acronym to spare?

The specific question is: the above (piece of) tree grammar needs to compile
with backtrack=true, which I don't like. I'm going to re-create a LLR
grammar doing the same. I see it mimes a lot the parsing grammar I used to
generate the source tree, but with a lot more cases in rules. This sounds
fine to me, but then I'm still getting some problem:

protected
conditionalOrExpression
    :	OR FALSE r=conditionalOrExpression	{shc=true;}	-> $r
    |	OR TRUE conditionalOrExpression		{shc=true;}	-> TRUE
    |	OR l=conditionalAndExpression r=conditionalOrRightSide[$l.tree] ->
$r
    |	conditionalAndExpression
    ;

protected
conditionalOrRightSide [CommonTree l]
    :	FALSE			{shc=true;}	-> $l
    |	TRUE			{shc=true;}	-> TRUE
    |	r=conditionalOrExpression		-> OR $l $r
    ;

conditionalOrExpression2 rises the error "(137): reference to undefined
label in rewrite rule: $l", which is the same I got in the parsing grammar
and which was circumvented by adopting a different notation (thanks to
John).

I have ANTLR 3.2. Is it a bug or am I the bug?

Regards,

Giampaolo