[antlr-interest] wildcard in tree grammar

Oliver Zeigermann oliver.zeigermann at gmail.com
Wed Nov 26 12:14:37 PST 2008


2008/10/21 Terence Parr <parrt at cs.usfca.edu>:
>
> On Oct 21, 2008, at 12:02 AM, Gavin Lambert wrote:
>> For analysis purposes, shouldn't ^(anything at all) be considered
>> equivalent to a single node anyway?  In much the same way that in
>> the expression "x + (y + z)", "x" and "(y + z)" are both atoms (in
>> terms of precedence).
>>
>> I'm a bit rusty on ANTLR's internal tree representation, but
>> certainly in a "normal" tree this is the case -- any given node
>> can have a subtree (or not), and you can uniquely refer to any
>> subtree by pointing at its root node.  I don't see why ANTLR would
>> need to behave any differently (and I can see quite a few cases
>> where it'd be beneficial if it could handle both cases at runtime,
>> not compile time).
>
> Hi. It turns out that parsing in two dimensions is a bit tricky ;)
> antlr serializes trees too one-dimensional strings, injecting
> imaginary down and up nodes to represent structure. so, we need to be
> able to distinguish between
>
> ^(X Y Z)
>
> and
>
> ^(X ^(Y Z))
>
> A subtree is very different terms of lookahead from a linear list.  A
> B is different than ^(A B). Lookahead is AB on the first one (LL(2))
> and ADOWN on the second one.
>
>>
>> Given the original problem mentioned in the issue:
>>   input: ^(not ^(and ^(= a b) ^(= c d)))
>>   rule: ^('not' ^('and' c51=. c52=.)) -> ...
>>
>> I don't see how this can be misinterpreted.  While processing the
>> 'and' subtree, it reads the first child node, discovers that it's
>> a subtree, reads the whole thing in and assigns the root node
>> (with dangling subtree) to c51.  Then it does the same for the
>> next subtree and c52.
>
> Agreed. After playing around all day yesterday, I came to the
> conclusion that the wild-card should in fact mean single node or
> subtree, which is normally what you want. I have simply altered the
> analysis to consider wild-card as  really ^(. .*) :)

Too bad for me as I have this in a tree parser (rule is spelled with
3xe on purpose):

treee
       : ^(. treee* )
       ;

which will never come to see the "treee*" part as the first "."
matches the complete subtree making my analysis (code is left out for
clarity) defunct.

Or am I missing something here (as usual).

Oliver


More information about the antlr-interest mailing list