[antlr-interest] Exploit ambiguity in tree rewriter

nafur nafur42 at gmail.com
Thu Jun 14 04:59:43 PDT 2012


Hi,

I already thought about this, but this yields new problems. If I build
my ASTs like this, I have to take care of all possible multiplications.

To match A B I have to match ^(* A B), ^(* A ^(* B ...)), ^(* ^(* ... A)
B), ^(* ^(* ... A) ^(* B ... )).

As I will at some point go for rules that match more than two factors,
this does not seem like a real option to me...

Gereon

On 06/14/2012 12:00 PM, Stefan Mätje wrote:
> Hi nafur,
> 
> from what I've read in this thread till now I got the impression that
> you've structured your AST in a way that counteracts your goals.
> 
> I'll try to give a simple example to get us on common ground. I will
> refer here only to simple scalar multiplication with an expression like
> this:
> 
> "5 * 3 * 1 * 2 * 4"
> 
> As far as I understand from your writings you would build an AST like
> the following from this input
> 
> "^(MULT 5 3 1 2 4)
> 
> and try to match it with a rule like this
> 
> topdown :
>     ^( MULT (before += .)* x=. (after += .)* )
>         { $x.has("IDENTITY") }?
>     ->  ^( MULT $before* $after* )
> ;
> 
> How should ANTLR generated code distribute the tokens from the AST to
> your variables "before", "x" and "after"? You're not giving ANTLR any
> hint about it. I guess you would end up with:
> before: "5 3 1 2"
> x:      "4"
> after:  ""    // empty
> 
> And you must to have "x" assigned the "1" to make your code work.
> 
> Doing expression evaluation and simplification is done differently with
> ANTLR and leads to ASTs that combine two operands with one operator that
> can then be simplified in a straightforward manner.
> 
> A more appropriate AST structure for the before mentioned input would be
> this recursive structure:
> 
> ^(* ^(* ^(* ^(* 5 3) 1) 2) 4)
> 
> In a "pretty printed" style it would appear like this.
> 
> ^(*
>     ^(*
>         ^(*
>             ^(*
>                 5
>                 3
>             )
>             1
>         )
>         2
>     )
>     4
> )
> 
> This way you have only to deal with an operator and two operands in one
> rule. A bare bones example (not working) looks like this:
> 
> bottomup:
>     expr
>     ;
> 
> expr:
>     ^(* a=expr b=INT {$b.text()== "1"}?)    ->    $a
>     ^(* a=INT {$a.text()== "1"}? b=expr)    ->    $b
>     ;
> 
> This way you can now match special things in the AST and act on them
> testing some properties of the trees. You can see the simplification
> that the result of "expr" is one of the operands if one of the others is
> a literal "1".
> 
> For a better explanation and an example look at
> http://www.antlr.org/wiki/display/~admin/2008/11/30/Example+tree+rewriting+with+patterns
> 
> 
> 
> I hope I did not misunderstand what you want to do.
> 
> Stefan
> 
> 
> Am 13.06.2012 10:18, schrieb nafur:
>> Hi,
>>
>> Ok, I guess I have to somehow redefine the nature of the rewrite rules I
>> want to use:
>>
>> My ASTs represent mathematical formulas and .has() checks, if this
>> symbol has some special mathematical property (like "is identity", "is
>> symmetric", "is invertible", "is orthogonal", ... I'm working with
>> matrices).
>>
>> Thus, this example is really the simplest one. Another example is:
>>
>> ^( MULT (before += .)* x=. ^( POW y=. TRANSPOSE) (after += .)* )
>> { $x.equals($y)&&  $x.has("ORTHOGONAL") }?
>> ->  ^( MULT $before* IDENTITY $after* )
>>
>> i.e.: replace any Q Q^T by the identity, if Q is orthogonal. Note that
>> this rule requires a comparison of whole subtrees ($x.equals($y)).
>>
>> Therefore, filtering single symbols would work for my first example (I
>> somehow selected a quite bad example). For more complex examples, I
>> really want to rely on the pattern matching offered by ANTLR.
>>
>> What I would like: Instead of resolving ambiguity by just choosing a
>> single possibility, try all of them (what makes more sense for rewrite
>> rules anyway).
>>
>> Gereon
>>
>> On 06/13/2012 07:35 AM, Benjamin S Wolf wrote:
>>> If I were programming this for a Python target, I'd guess it would
>>> look something like:
>>>
>>> topdown : ^( MULT (arg+=.)* )
>>>      ->  { filter_out_identities(MULT, $arg) };
>>>
>>> with a function filter_out_identities defined in a header somewhere:
>>>
>>> def filter_out_identities(node, nodes):
>>>      tree = CommonTree(node)
>>>      tree.addChildren(n for n in nodes if not n.has("IDENTITY"))
>>>      return tree
>>>
>>> I'm not sure what you really want to do. You were asking how to remove
>>> all of a token from a list, and you gave an example that indicated you
>>> had a function "has" that would let you check whether a token was an
>>> identity, and you're matching against every token. Sorry if I
>>> misunderstood that.
>>>
>>> If you only have one token that could be an identity, then you could
>>> do something like:
>>>
>>> topdown : ^( MULT (a+=~IDENTITY)* (IDENTITY* b+=~IDENTITY*)* IDENTITY*
>>>      ->  ^( MULT a* b* );
>>>
>>> and no actions are required.
>>>
>>> You might even be able to do
>>>
>>> ^( MULT ( IDENTITY | a+=~IDENTITY )* ) ->  ^( MULT a* )
>>>
>>> This is based on my understanding of parser rules; I'm not sure how
>>> different it would be for tree parser rules.
>>>
>>> On Tue, Jun 12, 2012 at 10:07 PM, nafur<nafur42 at gmail.com>  wrote:
>>>> Hi,
>>>>
>>>> While I'm not completely sure how this would look like, this would
>>>> somehow result in implementing the pattern matching myself, right?
>>>> As this example is a very easy one, this is no option for me. Actually,
>>>> I'm only using ANTLR because of this pattern matching...
>>>>
>>>> Gereon
>>>>
>>>> On 06/13/2012 05:34 AM, Benjamin S Wolf wrote:
>>>>> The best way to accomplish this is likely to be an arbitrary
>>>>> action, eg.:
>>>>>
>>>>> a : INT ->  { new CommonTree(new CommonToken(FLOAT, $INT.text +
>>>>> ".0")) } ;
>>>>>
>>>>> This takes an INT and returns a Tree with one node, a FLOAT token with
>>>>> text that is the int plus ".0".
>>>>>
>>>>> You'll want to make a tree via CommonTree() and add a list of children
>>>>> that excludes all tokens that have IDENTITY (or add each child if it
>>>>> doesn't have the identity). This means you'll be looping over all the
>>>>> nodes in the MULT subtree within the action.
>>>>>
>>>>> On Mon, Jun 11, 2012 at 2:01 AM, nafur<nafur42 at gmail.com>  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I asked about this ambiguity quite a while ago. Does nobody has
>>>>>> any idea
>>>>>> how to do this?
>>>>>>
>>>>>> Gereon
>>>>>>
>>>>>> On 05/29/2012 12:48 PM, nafur wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm using tree rewriters to manipulate formulas that are
>>>>>>> represented as
>>>>>>> ASTs. One operation I'd like to do is: "remove any identity from a
>>>>>>> multiplication".
>>>>>>>
>>>>>>> The replacement rule I'm currently working with is the following:
>>>>>>>
>>>>>>> topdown :
>>>>>>>        ^( MULT (before += .)* x=. (after += .)* )
>>>>>>>        { $x.has("IDENTITY") }?
>>>>>>>        ->  ^( MULT $before* $after* )
>>>>>>> ;
>>>>>>>
>>>>>>> As you probably see, we have some ambiguity here: before and
>>>>>>> after can
>>>>>>> both take arbitrarily many elements. If I take an AST like
>>>>>>> ^( MULT A I x ), the replacement will not match. printing some debug
>>>>>>> output reveals, that ANTLR will only check $x.has("IDENTITY"),
>>>>>>> i.e. will
>>>>>>> only try to match the last element in a multiplication.
>>>>>>> (That is actually not a surprise, as ANTLR issues a warning...)
>>>>>>>
>>>>>>> Is there any way to resolve this ambiguity in a way, such that I can
>>>>>>> match all occurrences?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gereon
>>>>>>
>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>> Unsubscribe:
>>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>


More information about the antlr-interest mailing list