[antlr-interest] Ambiguity error in lexer generation

Thu Sep 20 10:21:48 PDT 2007

On 9/20/07, Alex Kinneer <kinneera at hotmail.com> wrote:
>
> I'm hoping somebody can offer some insight as to why antlr would *nondeterministically* report lexer ambiguity warnings. That is to say, when I run the following commands:
>
> rm TestLang__.g
> rm *.class;
> java org.antlr.Tool TestLang.g
>
> It sometimes, but not always, generates warnings of this sort:
>
> warning(205): TestLang.g:1:8: ANTLR could not analyze this decision in rule Tokens; often this is because of recursive rule references visible from the left edge of alternatives. ANTLR will re-analyze the decision with a fixed lookahead of k=1. Consider using "options {k=1;}" for that decision and possibly adding a syntactic predicate.
> warning(209): TestLang.g:20:1: Multiple token rules can match input such as "'v'": T22, T24, T25, UNQUOTED_STRING, JAVA_ID
> As a result, tokens(s) JAVA_ID,UNQUOTED_STRING,T24,T25 were disabled for that input
> warning(209): TestLang.g:13:1: Multiple token rules can match input such as "'g'": T16, T18, UNQUOTED_STRING, JAVA_ID
> ...
>
>
> It is primarily the inconsistent reporting of these warnings that is very perplexing to me. There's also nothing obvious to me in the grammar that should be causing the warnings. There is a generic Java identifier lexer rule that could cause an ambiguity with a set of keywords, but it is my understanding that antlr should be able to resolve this ambiguity (by giving preference to the keywords). The unquoted string rule has a semantic predicate that should prevent ambiguity.
>
ANTLR has a timeout on DFA->NFA conversions, based on code comments it
looks like this may no longer be needed as such problems are now
detected elsewhere. But it looks like this timeout is still active and
could cause variable error messages based on general system load. Not
quite sure but it looks like a timeout may generate output similar to
that reported. Try adding the -report option when calling ANTLR, if
this is what's going on then there should be details under the NFA
conversion early termination report. The -Xconversiontimeout option
controls the timeout.
If this is the case you may still want to look at modifying your
grammar as this would mean ANTLR is having problems with your grammar
and may produce some rather nasty code.
Looking at your grammar it may be an issue of the complexity of the
predictor needed to distinguish your keywords from your JAVA_ID rule
given the large number of keywords and the complicated character
ranges in the LETTER and JAVA_ID_DIGIT rules. Either reducing the
complexity of your ranges (e.g. by replacing the checking done there
with semantic predicates) or replacing your keywords with an action in
JAVA_ID checking a keywords table may help.

Tom.

> Thanks,
> Alex
>
> _________________________________________________________________
> Gear up for Halo(r) 3 with free downloads and an exclusive offer. It's our way of saying thanks for using Windows Live™.
> http://gethalo3gear.com?ocid=SeptemberWLHalo3_WLHMTxt_2