[antlr-interest] Exploit ambiguity in tree rewriter

Thu Jun 14 10:10:30 PDT 2012

Am 14.06.2012 13:59:43 schrieb(en) nafur:
> Hi,
> 
> I already thought about this, but this yields new problems. If I build
> my ASTs like this, I have to take care of all possible multiplications.

I don't think so. Because the expr rule in my bare bones example is applied 
recursively and from the bottom of the tree up multiple(!) times by the 
bottomup rule all it needs to take care for is one operator and two operands 
(either of them could be an "expr" expression).

> To match A B I have to match ^(* A B), ^(* A ^(* B ...)), ^(* ^(* ... A) B), 
> ^(* ^(* ... A) ^(* B ... )).

Yes, at first they look as a lot of variants but it always boils down to a
^(* aExpr bExpr). Please really have a look at the examples for expression 
evaluation on the ANTLR website. And use the link to the Wiki I provided.

Have a look at the example AST I suggested below:

^(* ^(* ^(* ^(* 5 3) 1) 2) 4)

ANTLR will match and apply the expr rule 4 times to this AST in this manner:

1. match: ^(* 5 3)        ->  ^(* 5 3)
2. match: ^(* ^(* 5 3) 1) ->  ^(* 5 3) // real rewrite propagetes only this up
3. match  ^(* ^(* 5 3) 2) ->  ^(* ^(* 5 3) 2)  // no change
4. match  ^(* ^(* ^(* 5 3) 2) 4)  -> ^(* ^(* ^(* 5 3) 2) 4)

The 1 has vanished as expected and only 3 multiplications are left.

> As I will at some point go for rules that match more than two factors,
> this does not seem like a real option to me...

What kind of operators do you have that take more than two operands? I
can't imagine a use case at the moment.

Stefan

> Gereon
> 
> On 06/14/2012 12:00 PM, Stefan Mätje wrote:
> > Hi nafur,
> > 
> > from what I've read in this thread till now I got the impression that
> > you've structured your AST in a way that counteracts your goals.
> > 
> > I'll try to give a simple example to get us on common ground. I will
> > refer here only to simple scalar multiplication with an expression like
> > this:
> > 
> > "5 * 3 * 1 * 2 * 4"
> > 
> > As far as I understand from your writings you would build an AST like
> > the following from this input
> > 
> > "^(MULT 5 3 1 2 4)
> > 
> > and try to match it with a rule like this
> > 
> > topdown :
> >     ^( MULT (before += .)* x=. (after += .)* )
> >         { $x.has("IDENTITY") }?
> >     ->  ^( MULT $before* $after* )
> > ;
> > 
> > How should ANTLR generated code distribute the tokens from the AST to
> > your variables "before", "x" and "after"? You're not giving ANTLR any
> > hint about it. I guess you would end up with:
> > before: "5 3 1 2"
> > x:      "4"
> > after:  ""    // empty
> > 
> > And you must to have "x" assigned the "1" to make your code work.
> > 
> > Doing expression evaluation and simplification is done differently with
> > ANTLR and leads to ASTs that combine two operands with one operator that
> > can then be simplified in a straightforward manner.
> > 
> > A more appropriate AST structure for the before mentioned input would be
> > this recursive structure:
> > 
> > ^(* ^(* ^(* ^(* 5 3) 1) 2) 4)
> > 
> > In a "pretty printed" style it would appear like this.
> > 
> > ^(*
> >     ^(*
> >         ^(*
> >             ^(*
> >                 5
> >                 3
> >             )
> >             1
> >         )
> >         2
> >     )
> >     4
> > )
> > 
> > This way you have only to deal with an operator and two operands in one
> > rule. A bare bones example (not working) looks like this:
> > 
> > bottomup:
> >     expr
> >     ;
> > 
> > expr:
> >     ^(* a=expr b=INT {$b.text()== "1"}?)    ->    $a
> >     ^(* a=INT {$a.text()== "1"}? b=expr)    ->    $b
> >     ;
> > 
> > This way you can now match special things in the AST and act on them
> > testing some properties of the trees. You can see the simplification
> > that the result of "expr" is one of the operands if one of the others is
> > a literal "1".
> > 
> > For a better explanation and an example look at
> > http://www.antlr.org/wiki/display/~admin/2008/11/30/Example+tree+rewriting
> +with+patterns
> > 
> > 
> > 
> > I hope I did not misunderstand what you want to do.
> > 
> > Stefan
> > 
> > 
> > Am 13.06.2012 10:18, schrieb nafur:
> >> Hi,
> >>
> >> Ok, I guess I have to somehow redefine the nature of the rewrite rules I
> >> want to use:
> >>
> >> My ASTs represent mathematical formulas and .has() checks, if this
> >> symbol has some special mathematical property (like "is identity", "is
> >> symmetric", "is invertible", "is orthogonal", ... I'm working with
> >> matrices).
> >>
> >> Thus, this example is really the simplest one. Another example is:
> >>
> >> ^( MULT (before += .)* x=. ^( POW y=. TRANSPOSE) (after += .)* )
> >> { $x.equals($y)&&  $x.has("ORTHOGONAL") }?
> >> ->  ^( MULT $before* IDENTITY $after* )
> >>
> >> i.e.: replace any Q Q^T by the identity, if Q is orthogonal. Note that
> >> this rule requires a comparison of whole subtrees ($x.equals($y)).
> >>
> >> Therefore, filtering single symbols would work for my first example (I
> >> somehow selected a quite bad example). For more complex examples, I
> >> really want to rely on the pattern matching offered by ANTLR.
> >>
> >> What I would like: Instead of resolving ambiguity by just choosing a
> >> single possibility, try all of them (what makes more sense for rewrite
> >> rules anyway).
> >>
> >> Gereon
> >>
> >> On 06/13/2012 07:35 AM, Benjamin S Wolf wrote:
> >>> If I were programming this for a Python target, I'd guess it would
> >>> look something like:
> >>>
> >>> topdown : ^( MULT (arg+=.)* )
> >>>      ->  { filter_out_identities(MULT, $arg) };
> >>>
> >>> with a function filter_out_identities defined in a header somewhere:
> >>>
> >>> def filter_out_identities(node, nodes):
> >>>      tree = CommonTree(node)
> >>>      tree.addChildren(n for n in nodes if not n.has("IDENTITY"))
> >>>      return tree
> >>>
> >>> I'm not sure what you really want to do. You were asking how to remove
> >>> all of a token from a list, and you gave an example that indicated you
> >>> had a function "has" that would let you check whether a token was an
> >>> identity, and you're matching against every token. Sorry if I
> >>> misunderstood that.
> >>>
> >>> If you only have one token that could be an identity, then you could
> >>> do something like:
> >>>
> >>> topdown : ^( MULT (a+=~IDENTITY)* (IDENTITY* b+=~IDENTITY*)* IDENTITY*
> >>>      ->  ^( MULT a* b* );
> >>>
> >>> and no actions are required.
> >>>
> >>> You might even be able to do
> >>>
> >>> ^( MULT ( IDENTITY | a+=~IDENTITY )* ) ->  ^( MULT a* )
> >>>
> >>> This is based on my understanding of parser rules; I'm not sure how
> >>> different it would be for tree parser rules.
> >>>
> >>> On Tue, Jun 12, 2012 at 10:07 PM, nafur<nafur42 at gmail.com>  wrote:
> >>>> Hi,
> >>>>
> >>>> While I'm not completely sure how this would look like, this would
> >>>> somehow result in implementing the pattern matching myself, right?
> >>>> As this example is a very easy one, this is no option for me. Actually,
> >>>> I'm only using ANTLR because of this pattern matching...
> >>>>
> >>>> Gereon
> >>>>
> >>>> On 06/13/2012 05:34 AM, Benjamin S Wolf wrote:
> >>>>> The best way to accomplish this is likely to be an arbitrary
> >>>>> action, eg.:
> >>>>>
> >>>>> a : INT ->  { new CommonTree(new CommonToken(FLOAT, $INT.text +
> >>>>> ".0")) } ;
> >>>>>
> >>>>> This takes an INT and returns a Tree with one node, a FLOAT token with
> >>>>> text that is the int plus ".0".
> >>>>>
> >>>>> You'll want to make a tree via CommonTree() and add a list of children
> >>>>> that excludes all tokens that have IDENTITY (or add each child if it
> >>>>> doesn't have the identity). This means you'll be looping over all the
> >>>>> nodes in the MULT subtree within the action.
> >>>>>
> >>>>> On Mon, Jun 11, 2012 at 2:01 AM, nafur<nafur42 at gmail.com>  wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I asked about this ambiguity quite a while ago. Does nobody has
> >>>>>> any idea
> >>>>>> how to do this?
> >>>>>>
> >>>>>> Gereon
> >>>>>>
> >>>>>> On 05/29/2012 12:48 PM, nafur wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm using tree rewriters to manipulate formulas that are
> >>>>>>> represented as
> >>>>>>> ASTs. One operation I'd like to do is: "remove any identity from a
> >>>>>>> multiplication".
> >>>>>>>
> >>>>>>> The replacement rule I'm currently working with is the following:
> >>>>>>>
> >>>>>>> topdown :
> >>>>>>>        ^( MULT (before += .)* x=. (after += .)* )
> >>>>>>>        { $x.has("IDENTITY") }?
> >>>>>>>        ->  ^( MULT $before* $after* )
> >>>>>>> ;
> >>>>>>>
> >>>>>>> As you probably see, we have some ambiguity here: before and
> >>>>>>> after can
> >>>>>>> both take arbitrarily many elements. If I take an AST like
> >>>>>>> ^( MULT A I x ), the replacement will not match. printing some debug
> >>>>>>> output reveals, that ANTLR will only check $x.has("IDENTITY"),
> >>>>>>> i.e. will
> >>>>>>> only try to match the last element in a multiplication.
> >>>>>>> (That is actually not a surprise, as ANTLR issues a warning...)
> >>>>>>>
> >>>>>>> Is there any way to resolve this ambiguity in a way, such that I can
> >>>>>>> match all occurrences?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Gereon
> >>>>>>
> >>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>>> Unsubscribe:
> >>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >>>>>>
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> 
>