[antlr-interest] Difference in the following rules
ric.klaren at gmail.com
Tue Jul 12 00:59:17 PDT 2005
On 7/11/05, Tarun Khanna <tarunkhanna at gmail.com> wrote:
> The attached grammar produces a non-determinism warning in the following
> factor :
> ( ( LPAREN exp RPAREN ) | IDENT ) (DOT IDENT)* (DOT TAB)?
Notice that the (DOT IDENT)* may match the empty word e.g. nothing.
Add to that that (DOT TAB)? may also match nothing. Add to that that
the start of both may be a DOT. So antlr has a hard time choosing
between the closure (DOT IDENT)* and the optional part (DOT TAB)?. Try
reading the code that gets generated for it, it might help you get a
feel for it.
> However if I change it to the following (as suggested by John Brodie), the
> warning goes away.
> factor :
> ( ( LPAREN exp RPAREN ) | IDENT ) ( qualification)?
Notice that here you have only one optional thing.
> qualification :
> ( ( IDENT ( qualification)? ) | TAB )
And this is a left factoring of the first rule. E.g. the start of the
rule is now unambiguous one DOT. and what follows is also easy to
decide the choice can be made by looking at the next token and going
into the IDENT (qualification)? part if it's an IDENT and into TAB if
it's a TAB. E.g. by restructuring the rules you made it real easy for
antlr to make the choice between the two alternatives. (e.g. no
> The two rules are similar and should behave similarly. However the latter
> does not generate any warnings. Can anyone please tell my, why does the
> first one generate those warnings and what is it that changes in the latter
I hope the above helps a bit. In recursive descent parsing (that's
what antlr's parsers/lexers/treewalkers do) the parser decides which
alternative to work on based upon one or more tokens lookahead from
the input stream. To make the decision ANTLR uses lookahead sets for
alternatives. If the lookahead sets overlap the choice is ambiguous
and ANTLR gives a warning. Often this warning can be ignored. Also
check out the faq's about this and left factoring at jguru.com. It
really helps to read the generated code for some simple grammars to
get a feel for the beast.
More information about the antlr-interest