[antlr-interest] Ambiguity caused by recursion (I think)

Thu Apr 8 18:49:35 PDT 2010

Greetings!

On Thu, 2010-04-08 at 20:29 -0400, Cameron Ross wrote:
> Hi,
> 
> I have a grammar that calls a rule from two different "levels".  This leads
> to an ambiguity being reported from the two different calls to the rule.
>  Oddly, the ambiguity points to the same place, so I'm not sure why its
> "ambiguous".  The pertinent parts of the grammar are shown below and I've
> attached a screen grab of the syntax diagram from AntlrWorks.
> 
> 
> start_rule
> : clif_file
> ;
> 
> clif_file
> : cltext+
> ;
> 
> cltext
> : (OPEN CL_MODULE) => module -> ^(CLTEXT module)
> | text -> ^(CLTEXT text)
> ;
> module
> : OPEN CL_MODULE interpretablename exclusion_set? cltext CLOSE -> ^(MODULE
> ^(MODULE_IDENTIFIER interpretablename) exclusion_set? cltext)
> ;
> text
> : phrase+ -> ^(TEXT phrase+)
> ;
> 
> phrase
> : (OPEN CL_MODULE) => module -> ^(PHRASE module)
> | (OPEN CL_IMPORTS) => importation -> ^(PHRASE importation)
> | (OPEN CL_COMMENT) => commented_text -> ^(PHRASE commented_text)
> | sentence -> ^(PHRASE sentence)
> ;
> ...

Well I found 4 issues with the above (the first 2 may be trivial, but
they really bug me.... sorry)

1) the above is INCOMPLETE! e.g. what is `importation`, what is
`exclusion_set` and so on... (i will try to blunder forward anyway)

2) the syntactic predicates are useless! AFAIK ANTLR's LL(*) algorithm
can deal with disambiguating left prefixes that consist of a finite set
of tokens (like what we have here in your example). ANTLR has issues
when a left prefix is infinite (either via left-recursion or via a
looping construct *,+).

and now to the "meat" of your problem (i hope)

3) a `cltext` is permitted to be either a `module` or a `text`. further
a `text` may be a list of `phrase`s and a `phrase` may be a `module` or
other things. so `cltext` is ambiguous. given an input consisting of the
token sequence OPEN CL_MODULE, we do know how to parse. it is either

cltext -> module

or it is 

cltext -> text -> phrase -> module

you need to make a choice here. probably just remove `module` from the
`cltext` rule.

4) and now a deeper ambiguity: a `clfile` is a list of `cltext` and
`cltext` is just a `text` (since, i assume, we have removed the `module`
alternative) and `text` is a list of `phrase`.

now suppose our input consists of three `sentence`s --- let me refer to
a sentence as the string "s_i". so we have the input (3 sentences):

"s_1" "s_2" "s_3" 

now how is this to be parsed?

is it as

clfile -> cltext -> text -> (phrase[s_1], 
                             phrase[s_2], 
                             phrase[s_3])

or is it

clfile -> (cltext -> text -> phrase[s_1]), 
          (cltext -> text -> phrase[s_2]), 
          (cltext -> text -> phrase[s_3])

or is it

clfile -> (cltext -> text -> phrase[s_1]), 
          (cltext -> text -> (phrase[s_2],phrase[s_3]))

or what is it supposed to be?

hope this helps...
   -jbb