[antlr-interest] Question on ambiguouity

Wed Dec 27 01:22:08 PST 2006

On 12/27/06, James Mello <james.mello at intelligentdiscovery.com> wrote:
> I'm relatively new to this whole parser grammer thing and I'm trying to get
> up to speed as quickly as possible. This is a particularly troubling thing
> as I can't quite grok why this doesn't work.... So given this simple set of
> rules and productions....
>
> class MyParser extends Parser;
>
> options
> {
>         buildAST = true;
>         k = 2;
> }
>
> multipleIDs :
>         (ID)+
>         ;
>
> class MyLexer extends Lexer;
>
> options
> {
>         k = 2;
>         charVocabulary = '\u0000'..'\ufffe';
>         caseSensitive = false;
> }
>
> ID :
>         'a'..'z'
>         ;
>
> The expression for multipleIDs compiles cleanly...
>
> When you change the multipleIDs rule to the following
>
> multipleIDs :
>         ID (multipleIDs)*
>
> You end up with a warning that says:
>
> Nondeterminism upon K==1:ID K==2:ID between alt 1 and exit branch of block
>
> I'm trying to figure out why this is the case as it seems to me that a
> recursive rule like this should work. I've looked a bit for some info on the
> faq and didn't find much on this.
>
> Finally, since this is NOT the way to write recursive rules, how does one go
> about doing this correctly?
>
>
> James Mello : Software Engineer - ATS, Inc.
> web: www.intelligentdiscovery.com | (p) 360.698.7100x236 | (f) 360.698.7200

Presuming you're just wanting to say that 'multipleIDs' consists of 1
or more 'ID's, then this will work fine:

multipleIDs :(ID)* ;

You'll also need a lexer rule that defines ID separators - for example
whitespace - with something like this:

WS     :
    (' '
    | '\t'
    | '\r' '\n' { newline(); }
    | '\n'      { newline(); }
    )
    { $setType(Token.SKIP); }
  ;

This tells the lexer to use whitespace as token delimiters, but not to
pass it on to the parser.

HTH

Stuart Dootson