tree v text parsing (was Re: [antlr-interest] Article against TreeWalkers)

Fri Mar 10 09:46:11 PST 2006

Andrew Bell wrote:
> On 3/8/06, Terence Parr <parrt at cs.usfca.edu> wrote:
>> On Mar 8, 2006, at 8:54 AM, Jeff Barnes wrote:
>>> Andy sez:
>>> I realized that tree parsing is identical to text parsing, albeit
>>> in two dimensions instead of one.
>> I've been trying to scream that as loud as I can...not loud enough I
>> guess. ;)  All v3 recognizers derive from same base class, so even in
>> practice recog. is all the same thing: 1D or 2D :)
>>

I notice from some other posts that there seems to be some confusion or
controversy about this idea of text v. tree parsing, so I'll take a stab
at it 'cause it's kinda fun.  I suspect people run into trouble with it
because they're thinking in concrete physical metaphors - an actual tree
structure, in 2d or 3d space.  But computational "trees" are not trees
at all - that's just a metaphor.  Nothing says we must construe trees in
2d space.  In fact, I'll wager most if not all mathematical structures
can be modeled as flat structures - what is a set, after all?  It's easy
to model a tree as a set, and therefore as a sequence, just like a
string of text.  (For that matter, even the "dimensionality" of a string
or sequence is metaphorical, a mapping between our experience of living
in physical space to the purely mathematical relations of sequences.)

It's easiest to see the fundamental unity of *all* parsing if one
focuses on the concepts of language and representation instead of the
metaphysics of things like "data structures" or the like.  Language is
inherently sequential, since time is at its core.  It also involves
"seeing" structure which only exists linguistically, but it remains
sequential.  (I'm not even sure what "2d language" would mean.)

Going from "text parsing" to "tree parsing" is no different than going
from lexing to parsing.  It's all just parsing, and one only ever parses
language utterances (texts, strings, whatever you want to call them).
The only way it even makes sense to talk about tree parsing is if one
construes a concrete tree as an expression in a language.

Consider:

a.	Bill is the father of Mary and John.

Text?  String?  Tree?  All of the above?  What about

b.	^(Bill Mary John)
c.	(father Bill (Mary John))
d.	Mary to John wa Biru ga papa desu
e.	Mary, John <-^ Bill
d.	...whatever syntax you please...

I see different strings that can be construed to mean the same thing.
They could all be either strings in a text or in a grammar. What those
strings represent ("mean") depends on the languages used to *interpret*
them.  The grammar provides structure; how one interprets the
metaphysics of that structure is a different thing.  Each of the above
strings could be interpreted as a representation of a tree or of the
parse structure of a  set of trees (a grammar).

So I guess I must quibble with Terence on one point:  I don't see the
need for 1d or 2d etc. to enter into it.  All parsing is the same.  Or
more accurately, all grammars map source sequences to target sequences,
  regardless of what the elements of those sequences represent.  Usually
the target sequences represent the interpretation of a hierarchical (2d)
structural grouping of source elements.  You could have an n-dimensional
tree, and still write a grammar for it that doesn't differ fundamentally
from a programming language grammar.

-gregg