[antlr-interest] Question on ambiguouity

Wed Dec 27 02:14:32 PST 2006

Hi,

On 12/27/06, James Mello <james.mello at intelligentdiscovery.com> wrote:
> class MyParser extends Parser;
>
> options
> {
>         buildAST = true;
>         k = 2;
> }
>
> multipleIDs :
>         (ID)+
>         ;

vs.

> multipleIDs :
>         ID (multipleIDs)*

In principle this is equivalent to the ()+ notation. Although it's a
weird way of writing it down. Probably better to use: ID (ID)*.

In a parser you should end the start rule with an EOF token (so antlr
knows where to stop). With the first notation the choice for antlr is
evident just keep eating ID's or give an error if something else is
encountered. With the second antlr cannot see if there's a
(multipleIDs)* follwing the first ID adding the EOF behind the start
rule should fix that.

I guess adding the EOF and optionally reducing k to one should fix the warning.

Also note that a warning doesn't have to be bad. Often times you have
grammars were warnings are present but where the right thing is done
(antlr prefers the first alternative)

In some cases you can turn of warnings.

> Finally, since this is NOT the way to write recursive rules, how does one go
> about doing this correctly?

I guess it's good to look at some of the bigger grammars to get a feel
for it. There's no one way since it depends on your language.

This may sound weird but... try reading the code generated for the
parser and lexer for small grammars this gives a lot of insight in how
things work, and adds to your intuition of how the tool works. There's
a commandline option to generate a textual description of the grammar
as well with descriptions of how the lexer/parser will work, this is
sometimes instructive as well.

Cheers,

Ric