[antlr-interest] wildcard in tree grammar
Sam Harwell
sharwell at pixelminegames.com
Mon Dec 1 09:57:29 PST 2008
Here's a fictional example of why you might want to distinguish "any
node or tree" from "any node" in a tree parser.
Input: (3)+(2)
Resulting Tree: ^(+ ^(expr 3) ^(expr 2))
Goal: pattern rewrite in a filter tree parser to remove some nodes
resulting from unnecessary parentheses.
Pattern:
remove_parens_atomicblock
: ^(binary_op ^(expr left=.) ^(expr right=.))
-> ^(binary_op $left $right)
;
Result: ^(+ 3 2)
This works as long as '.' means "any node", but may fail due to order of
operations if the parenthesized expression is not a primary expression.
On a sidenote, the reason I said ^(.) is not currently allowed in a tree
parser earlier is this:
ANTLR.g3 line 534:
tree_
: TREE_BEGIN^
treeRoot ( element )+
RPAREN!
;
One possible resolution add a 'WILDCARD_TREE' rule in ANTLR.g3 as
follows, and add it as an alt to 'terminal'. This clears up both the
wildcard-as-root issue and the "any node"/"any node or tree" ambiguity.
It would be allowed anywhere in a tree parser except as a tree root.
WILDCARD_TREE : '^.';
This has the added advantage in the Tool's source that WILDCARD always
refers to a single element of the source stream.
Sam
-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr
Sent: Monday, December 01, 2008 10:44 AM
To: Gavin Lambert
Cc: antlr-interest Interest
Subject: Re: [antlr-interest] wildcard in tree grammar
On Nov 30, 2008, at 11:15 PM, Gavin Lambert wrote:
> At 09:29 1/12/2008, Terence Parr wrote:
> >not sure. I guess i left out as it's weird. When would you write:
> >
> >^(. ID). It's always the root that says what kind of thing it is,
> >
> >right?
>
> This is a completely fabricated example, but your tree structure
> might dictate that at that position you know it's either a
> ^(ORDER1FN ID) or a ^(ORDER2FN ID) or a ^(ORDER3FN ID). In one tree
> grammar you care about the distinction, while in another you don't.
'.' is great to avoid a subtree or any random operand node you don't
care about. As a root, it can only be a node so it's pretty easy to
give the set of valid roots if necessary. Operands can be subtrees
and those can be whole grammars; harder to ignore. Remember that AST
construction is all about identifying the ops and pseudo-ops in your
program, then making them roots. Why would you ever want to ingore
what operation your looking at?
Oh, I just found one. You want to find all ID refs...well, actually
that is just:
id : ID ;
If you don't care about the root, just don't give it, right?
x : ^(. ID) ;
is identical to ID except it won't match ID as a root but plain ID
would.
See my drift here? Random root with known children == children.
> Of course you could just write ^((ORDER1FN | ORDER2FN | ORDER3FN)
> ID), but if you know that there will never be any other
> possibilities then it's simpler to just write ^(. ID), especially as
> the number of variations increases.
>
>
> And then of course there's Oliver's example, where he just wanted to
> traverse the (sub)tree regardless of structure (presumably to pretty-
> print it or something).
If you don't care about structure, one could argue why you're using a
grammar to do that ;)
> They might be unusual cases, admittedly, but they seem reasonable.
I'm still unconvinced I'm afraid. ;)
Ter
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess
More information about the antlr-interest
mailing list