[antlr-interest] Deep Disambiguation
Randall R Schulz
rschulz at sonic.net
Wed Nov 22 11:40:52 PST 2006
Hi,
I have a thorny disambiguation problem on which I'd like some
suggestions or pointers.
The PRELIMINARIES can be skipped. Look below for THE SETUP and THE
PROBLEM to cut to the chase.
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
PRELIMINARIES:
I'm developing a parser for a format call TSTP (Thousands of Soluations
for Theorem Provers, part of the TPTP project:
<http://www.cs.miami.edu/~tptp/>).
Among the constructs that can be encoded in this format is plain old
First-Order Logic (FOL).
FOL has things called Formulas, which can be thought of as the level at
which objects have truth values--they're either true or false. Formulas
are either the application of a named predicate to a list of arguments
or boolean combinations thereof (there are also quantifiers, for-all
and there-exists).
At a lower (syntactic) level there are things called Terms. Terms are
ways of specifying the entities about which logical statements are
being made. They are individual constants or the application of named
functions to lists of arguments, which are themselves Terms.
The ambiguity I'm facing stems from the fact that functional terms look
syntactically just like predicate formulas:
A functional term:
functionName(termArg1, termArg2, fun2(f2Arg1, f2Arg2))
A predicate formula:
predicateName(termArg1, termArg2, fun2(f2Arg1, f2Arg2))
The only structural difference is that terms nest arbitrarily but
predicates do not. That is, both functions and predicates take terms as
arguments. Predicates cannot be applied to predicate (or boolean or
quantified) formulas.
In TSTP there are also two allowed infix predicate formulas, for the
special predicates identity (=) and distinctness (!=).
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
THE SETUP:
So when I parse something like this:
fun1(arg1, arg2, fun2(fun2Arg)) = fun3(fun3Arg)
I have to build two (top-level) terms and combine them into a predicate
formula that applies the identity predicate to those two terms.
By comparison, parsing this:
pred1(arg1, arg2, fun2(fun2Arg)) & pred2(pred2Arg)
yields a boolean AND formula combining the two top-level predicate
formulas.
-==--==--==--==--==--==--==--==--==--==--==--==--==--==--==-
THE PROBLEM
As you can see from comparing the two immediately preceding examples,
until either the ampersand or the equal sign is encountered, the
interpretation of preceding content is unknown.
I should note, too, that there are no declarations that state whether a
given name refers to a predicate or a function.
So how do I resolve this? I'd rather not build a generic tree structure
and convert it to a Predicate Formula or a Function Term only after the
parse completes. Everything else in the grammar admits direct local
construction of the formula and / or term constituents, building the
overall tree structure incrementally as the parse proceeds.
Thanks.
Randall Schulz
More information about the antlr-interest
mailing list