[antlr-interest] Ambiguity caused by recursion (I think)
John B. Brodie
jbb at acm.org
Thu Apr 8 18:49:35 PDT 2010
Greetings!
On Thu, 2010-04-08 at 20:29 -0400, Cameron Ross wrote:
> Hi,
>
> I have a grammar that calls a rule from two different "levels". This leads
> to an ambiguity being reported from the two different calls to the rule.
> Oddly, the ambiguity points to the same place, so I'm not sure why its
> "ambiguous". The pertinent parts of the grammar are shown below and I've
> attached a screen grab of the syntax diagram from AntlrWorks.
>
>
> start_rule
> : clif_file
> ;
>
> clif_file
> : cltext+
> ;
>
> cltext
> : (OPEN CL_MODULE) => module -> ^(CLTEXT module)
> | text -> ^(CLTEXT text)
> ;
> module
> : OPEN CL_MODULE interpretablename exclusion_set? cltext CLOSE -> ^(MODULE
> ^(MODULE_IDENTIFIER interpretablename) exclusion_set? cltext)
> ;
> text
> : phrase+ -> ^(TEXT phrase+)
> ;
>
> phrase
> : (OPEN CL_MODULE) => module -> ^(PHRASE module)
> | (OPEN CL_IMPORTS) => importation -> ^(PHRASE importation)
> | (OPEN CL_COMMENT) => commented_text -> ^(PHRASE commented_text)
> | sentence -> ^(PHRASE sentence)
> ;
> ...
Well I found 4 issues with the above (the first 2 may be trivial, but
they really bug me.... sorry)
1) the above is INCOMPLETE! e.g. what is `importation`, what is
`exclusion_set` and so on... (i will try to blunder forward anyway)
2) the syntactic predicates are useless! AFAIK ANTLR's LL(*) algorithm
can deal with disambiguating left prefixes that consist of a finite set
of tokens (like what we have here in your example). ANTLR has issues
when a left prefix is infinite (either via left-recursion or via a
looping construct *,+).
and now to the "meat" of your problem (i hope)
3) a `cltext` is permitted to be either a `module` or a `text`. further
a `text` may be a list of `phrase`s and a `phrase` may be a `module` or
other things. so `cltext` is ambiguous. given an input consisting of the
token sequence OPEN CL_MODULE, we do know how to parse. it is either
cltext -> module
or it is
cltext -> text -> phrase -> module
you need to make a choice here. probably just remove `module` from the
`cltext` rule.
4) and now a deeper ambiguity: a `clfile` is a list of `cltext` and
`cltext` is just a `text` (since, i assume, we have removed the `module`
alternative) and `text` is a list of `phrase`.
now suppose our input consists of three `sentence`s --- let me refer to
a sentence as the string "s_i". so we have the input (3 sentences):
"s_1" "s_2" "s_3"
now how is this to be parsed?
is it as
clfile -> cltext -> text -> (phrase[s_1],
phrase[s_2],
phrase[s_3])
or is it
clfile -> (cltext -> text -> phrase[s_1]),
(cltext -> text -> phrase[s_2]),
(cltext -> text -> phrase[s_3])
or is it
clfile -> (cltext -> text -> phrase[s_1]),
(cltext -> text -> (phrase[s_2],phrase[s_3]))
or what is it supposed to be?
hope this helps...
-jbb
More information about the antlr-interest
mailing list