[antlr-interest] Deep Disambiguation

Wed Nov 22 11:40:52 PST 2006

Hi,

I have a thorny disambiguation problem on which I'd like some 
suggestions or pointers.

The PRELIMINARIES can be skipped. Look below for THE SETUP and THE 
PROBLEM to cut to the chase.

-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
PRELIMINARIES:

I'm developing a parser for a format call TSTP (Thousands of Soluations 
for Theorem Provers, part of the TPTP project: 
<http://www.cs.miami.edu/~tptp/>).

Among the constructs that can be encoded in this format is plain old 
First-Order Logic (FOL).

FOL has things called Formulas, which can be thought of as the level at 
which objects have truth values--they're either true or false. Formulas 
are either the application of a named predicate to a list of arguments 
or boolean combinations thereof (there are also quantifiers, for-all 
and there-exists).

At a lower (syntactic) level there are things called Terms. Terms are 
ways of specifying the entities about which logical statements are 
being made. They are individual constants or the application of named 
functions to lists of arguments, which are themselves Terms.

The ambiguity I'm facing stems from the fact that functional terms look 
syntactically just like predicate formulas:

A functional term:

  functionName(termArg1, termArg2, fun2(f2Arg1, f2Arg2))

A predicate formula:

  predicateName(termArg1, termArg2, fun2(f2Arg1, f2Arg2))

The only structural difference is that terms nest arbitrarily but 
predicates do not. That is, both functions and predicates take terms as 
arguments. Predicates cannot be applied to predicate (or boolean or 
quantified) formulas.

In TSTP there are also two allowed infix predicate formulas, for the 
special predicates identity (=) and distinctness (!=).

-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
THE SETUP:

So when I parse something like this:

  fun1(arg1, arg2, fun2(fun2Arg)) = fun3(fun3Arg)

I have to build two (top-level) terms and combine them into a predicate 
formula that applies the identity predicate to those two terms.

By comparison, parsing this:

  pred1(arg1, arg2, fun2(fun2Arg)) & pred2(pred2Arg)

yields a boolean AND formula combining the two top-level predicate 
formulas.

-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
THE PROBLEM

As you can see from comparing the two immediately preceding examples, 
until either the ampersand or the equal sign is encountered, the 
interpretation of preceding content is unknown.

I should note, too, that there are no declarations that state whether a 
given name refers to a predicate or a function.

So how do I resolve this? I'd rather not build a generic tree structure 
and convert it to a Predicate Formula or a Function Term only after the 
parse completes. Everything else in the grammar admits direct local 
construction of the formula and / or term constituents, building the 
overall tree structure incrementally as the parse proceeds.

Thanks.

Randall Schulz