[antlr-interest] Exploit ambiguity in tree rewriter

Stefan Mätje Stefan.Maetje at esd-electronics.com
Thu Jun 14 03:00:09 PDT 2012


Hi nafur,

from what I've read in this thread till now I got the impression that 
you've structured your AST in a way that counteracts your goals.

I'll try to give a simple example to get us on common ground. I will 
refer here only to simple scalar multiplication with an expression like 
this:

"5 * 3 * 1 * 2 * 4"

As far as I understand from your writings you would build an AST like 
the following from this input

"^(MULT 5 3 1 2 4)

and try to match it with a rule like this

topdown :
     ^( MULT (before += .)* x=. (after += .)* )
         { $x.has("IDENTITY") }?
     ->  ^( MULT $before* $after* )
;

How should ANTLR generated code distribute the tokens from the AST to 
your variables "before", "x" and "after"? You're not giving ANTLR any 
hint about it. I guess you would end up with:
before: "5 3 1 2"
x:      "4"
after:  ""	// empty

And you must to have "x" assigned the "1" to make your code work.

Doing expression evaluation and simplification is done differently with 
ANTLR and leads to ASTs that combine two operands with one operator that 
can then be simplified in a straightforward manner.

A more appropriate AST structure for the before mentioned input would be 
this recursive structure:

^(* ^(* ^(* ^(* 5 3) 1) 2) 4)

In a "pretty printed" style it would appear like this.

^(*
     ^(*
         ^(*
             ^(*
                 5
                 3
             )
             1
         )
         2
     )
     4
)

This way you have only to deal with an operator and two operands in one 
rule. A bare bones example (not working) looks like this:

bottomup:
	expr
	;

expr:
	^(* a=expr b=INT {$b.text()== "1"}?)	->	$a
	^(* a=INT {$a.text()== "1"}? b=expr)	->	$b
	;

This way you can now match special things in the AST and act on them 
testing some properties of the trees. You can see the simplification 
that the result of "expr" is one of the operands if one of the others is 
a literal "1".

For a better explanation and an example look at
http://www.antlr.org/wiki/display/~admin/2008/11/30/Example+tree+rewriting+with+patterns


I hope I did not misunderstand what you want to do.

Stefan


Am 13.06.2012 10:18, schrieb nafur:
> Hi,
>
> Ok, I guess I have to somehow redefine the nature of the rewrite rules I
> want to use:
>
> My ASTs represent mathematical formulas and .has() checks, if this
> symbol has some special mathematical property (like "is identity", "is
> symmetric", "is invertible", "is orthogonal", ... I'm working with
> matrices).
>
> Thus, this example is really the simplest one. Another example is:
>
> ^( MULT (before += .)* x=. ^( POW y=. TRANSPOSE) (after += .)* )
> { $x.equals($y)&&  $x.has("ORTHOGONAL") }?
> ->  ^( MULT $before* IDENTITY $after* )
>
> i.e.: replace any Q Q^T by the identity, if Q is orthogonal. Note that
> this rule requires a comparison of whole subtrees ($x.equals($y)).
>
> Therefore, filtering single symbols would work for my first example (I
> somehow selected a quite bad example). For more complex examples, I
> really want to rely on the pattern matching offered by ANTLR.
>
> What I would like: Instead of resolving ambiguity by just choosing a
> single possibility, try all of them (what makes more sense for rewrite
> rules anyway).
>
> Gereon
>
> On 06/13/2012 07:35 AM, Benjamin S Wolf wrote:
>> If I were programming this for a Python target, I'd guess it would
>> look something like:
>>
>> topdown : ^( MULT (arg+=.)* )
>>      ->  { filter_out_identities(MULT, $arg) };
>>
>> with a function filter_out_identities defined in a header somewhere:
>>
>> def filter_out_identities(node, nodes):
>>      tree = CommonTree(node)
>>      tree.addChildren(n for n in nodes if not n.has("IDENTITY"))
>>      return tree
>>
>> I'm not sure what you really want to do. You were asking how to remove
>> all of a token from a list, and you gave an example that indicated you
>> had a function "has" that would let you check whether a token was an
>> identity, and you're matching against every token. Sorry if I
>> misunderstood that.
>>
>> If you only have one token that could be an identity, then you could
>> do something like:
>>
>> topdown : ^( MULT (a+=~IDENTITY)* (IDENTITY* b+=~IDENTITY*)* IDENTITY*
>>      ->  ^( MULT a* b* );
>>
>> and no actions are required.
>>
>> You might even be able to do
>>
>> ^( MULT ( IDENTITY | a+=~IDENTITY )* ) ->  ^( MULT a* )
>>
>> This is based on my understanding of parser rules; I'm not sure how
>> different it would be for tree parser rules.
>>
>> On Tue, Jun 12, 2012 at 10:07 PM, nafur<nafur42 at gmail.com>  wrote:
>>> Hi,
>>>
>>> While I'm not completely sure how this would look like, this would
>>> somehow result in implementing the pattern matching myself, right?
>>> As this example is a very easy one, this is no option for me. Actually,
>>> I'm only using ANTLR because of this pattern matching...
>>>
>>> Gereon
>>>
>>> On 06/13/2012 05:34 AM, Benjamin S Wolf wrote:
>>>> The best way to accomplish this is likely to be an arbitrary action, eg.:
>>>>
>>>> a : INT ->  { new CommonTree(new CommonToken(FLOAT, $INT.text + ".0")) } ;
>>>>
>>>> This takes an INT and returns a Tree with one node, a FLOAT token with
>>>> text that is the int plus ".0".
>>>>
>>>> You'll want to make a tree via CommonTree() and add a list of children
>>>> that excludes all tokens that have IDENTITY (or add each child if it
>>>> doesn't have the identity). This means you'll be looping over all the
>>>> nodes in the MULT subtree within the action.
>>>>
>>>> On Mon, Jun 11, 2012 at 2:01 AM, nafur<nafur42 at gmail.com>  wrote:
>>>>> Hi,
>>>>>
>>>>> I asked about this ambiguity quite a while ago. Does nobody has any idea
>>>>> how to do this?
>>>>>
>>>>> Gereon
>>>>>
>>>>> On 05/29/2012 12:48 PM, nafur wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm using tree rewriters to manipulate formulas that are represented as
>>>>>> ASTs. One operation I'd like to do is: "remove any identity from a
>>>>>> multiplication".
>>>>>>
>>>>>> The replacement rule I'm currently working with is the following:
>>>>>>
>>>>>> topdown :
>>>>>>        ^( MULT (before += .)* x=. (after += .)* )
>>>>>>        { $x.has("IDENTITY") }?
>>>>>>        ->  ^( MULT $before* $after* )
>>>>>> ;
>>>>>>
>>>>>> As you probably see, we have some ambiguity here: before and after can
>>>>>> both take arbitrarily many elements. If I take an AST like
>>>>>> ^( MULT A I x ), the replacement will not match. printing some debug
>>>>>> output reveals, that ANTLR will only check $x.has("IDENTITY"), i.e. will
>>>>>> only try to match the last element in a multiplication.
>>>>>> (That is actually not a surprise, as ANTLR issues a warning...)
>>>>>>
>>>>>> Is there any way to resolve this ambiguity in a way, such that I can
>>>>>> match all occurrences?
>>>>>>
>>>>>> Thanks,
>>>>>> Gereon
>>>>>
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list