[antlr-interest] wildcard in tree grammar

Mon Dec 1 09:57:29 PST 2008

Here's a fictional example of why you might want to distinguish "any
node or tree" from "any node" in a tree parser.

Input: (3)+(2)
Resulting Tree: ^(+ ^(expr 3) ^(expr 2))
Goal: pattern rewrite in a filter tree parser to remove some nodes
resulting from unnecessary parentheses.

Pattern:

remove_parens_atomicblock
  : ^(binary_op ^(expr left=.) ^(expr right=.))
    -> ^(binary_op $left $right)
  ;

Result: ^(+ 3 2)

This works as long as '.' means "any node", but may fail due to order of
operations if the parenthesized expression is not a primary expression.

On a sidenote, the reason I said ^(.) is not currently allowed in a tree
parser earlier is this:

ANTLR.g3 line 534:

tree_
	:	TREE_BEGIN^
		treeRoot ( element )+
		RPAREN!
	;

One possible resolution add a 'WILDCARD_TREE' rule in ANTLR.g3 as
follows, and add it as an alt to 'terminal'. This clears up both the
wildcard-as-root issue and the "any node"/"any node or tree" ambiguity.
It would be allowed anywhere in a tree parser except as a tree root.

WILDCARD_TREE : '^.';

This has the added advantage in the Tool's source that WILDCARD always
refers to a single element of the source stream.

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr
Sent: Monday, December 01, 2008 10:44 AM
To: Gavin Lambert
Cc: antlr-interest Interest
Subject: Re: [antlr-interest] wildcard in tree grammar

On Nov 30, 2008, at 11:15 PM, Gavin Lambert wrote:

> At 09:29 1/12/2008, Terence Parr wrote:
> >not sure.  I guess i left out as it's weird.  When would you write:
> >
> >^(. ID).  It's always the root that says what kind of thing it is,
> >
> >right?
>
> This is a completely fabricated example, but your tree structure  
> might dictate that at that position you know it's either a  
> ^(ORDER1FN ID) or a ^(ORDER2FN ID) or a ^(ORDER3FN ID).  In one tree  
> grammar you care about the distinction, while in another you don't.

'.' is great to avoid a subtree or any random operand node you don't  
care about.  As a root, it can only be a node so it's pretty easy to  
give the set of valid roots if necessary.  Operands can be subtrees  
and those can be whole grammars; harder to ignore.  Remember that AST  
construction is all about identifying the ops and pseudo-ops in your  
program, then making them roots.  Why would you ever want to ingore  
what operation your looking at?

Oh, I just found one.  You want to find all ID refs...well, actually  
that is just:

id : ID ;

If you don't care about the root, just don't give it, right?

x : ^(. ID) ;

is identical to ID except it won't match ID as a root but plain ID  
would.

See my drift here?  Random root with known children == children.

> Of course you could just write ^((ORDER1FN | ORDER2FN | ORDER3FN)  
> ID), but if you know that there will never be any other  
> possibilities then it's simpler to just write ^(. ID), especially as  
> the number of variations increases.
>
>
> And then of course there's Oliver's example, where he just wanted to  
> traverse the (sub)tree regardless of structure (presumably to pretty- 
> print it or something).

If you don't care about structure, one could argue why you're using a  
grammar to do that ;)

> They might be unusual cases, admittedly, but they seem reasonable.

I'm still unconvinced I'm afraid. ;)

Ter

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess