[antlr-interest] wildcard in tree grammar

Tue Oct 21 10:15:17 PDT 2008

On Oct 21, 2008, at 12:02 AM, Gavin Lambert wrote:
> For analysis purposes, shouldn't ^(anything at all) be considered
> equivalent to a single node anyway?  In much the same way that in
> the expression "x + (y + z)", "x" and "(y + z)" are both atoms (in
> terms of precedence).
>
> I'm a bit rusty on ANTLR's internal tree representation, but
> certainly in a "normal" tree this is the case -- any given node
> can have a subtree (or not), and you can uniquely refer to any
> subtree by pointing at its root node.  I don't see why ANTLR would
> need to behave any differently (and I can see quite a few cases
> where it'd be beneficial if it could handle both cases at runtime,
> not compile time).

Hi. It turns out that parsing in two dimensions is a bit tricky ;)  
antlr serializes trees too one-dimensional strings, injecting  
imaginary down and up nodes to represent structure. so, we need to be  
able to distinguish between

^(X Y Z)

and

^(X ^(Y Z))

A subtree is very different terms of lookahead from a linear list.  A  
B is different than ^(A B). Lookahead is AB on the first one (LL(2))  
and ADOWN on the second one.

>
> Given the original problem mentioned in the issue:
>   input: ^(not ^(and ^(= a b) ^(= c d)))
>   rule: ^('not' ^('and' c51=. c52=.)) -> ...
>
> I don't see how this can be misinterpreted.  While processing the
> 'and' subtree, it reads the first child node, discovers that it's
> a subtree, reads the whole thing in and assigns the root node
> (with dangling subtree) to c51.  Then it does the same for the
> next subtree and c52.

Agreed. After playing around all day yesterday, I came to the  
conclusion that the wild-card should in fact mean single node or  
subtree, which is normally what you want. I have simply altered the  
analysis to consider wild-card as  really ^(. .*) :)

Ter