[antlr-interest] Re: Please enlighten a new user..

Fri Dec 23 10:51:28 PST 2005

<resend, to list...>

On 12/23/05, Mike Feldmeier <mfeldmeier at metadata.com> wrote:
> I have been using ANTLR for about three days now.  I have read the reference
> guide and two tutorials, but I still don't understand this problem.
>
>   A: "data";
>   B: "database" | "by";
>
> I get the following warning (it seems to work, I just despise warnings):
>
> test.g: warning:lexical nondeterminism between rules A and B upon
> test.g:     k==1:'d'
> test.g:     k==2:'a'
> test.g:     k==3:'t'
> test.g:     k==4:'a'
> test.g:     k==5:<end-of-token>
>
> Interestingly, if I take out the ''| "by"'' from rule B, the warning
> disappears.

I think the problem is linear approximate lookahead, coupled with the
fact that end-of-token is used as a placeholder past the end of the
input.  The problem is that "by" at the end of the stream is
represented for k=5 as:

    k==1:'b'
    k==2:'y'
    k==3:<end-of-token>
    k==4:<end-of-token>
    k==5:<end-of-token>

So that means rule B matches for:

    k==1:'d' or 'b'
    k==2:'a' or 'y'
    k==3:'t' or <end-of-token>
    k==4:'a' or <end-of-token>
    k==5:'b' or <end-of-token>

Look at that last entry again, and remember these are sets.  That
means that if I see "data<EOF>", rule A can obviously match, but rule
B is also a candidate ("data" from "database", followed by
<end-of-token> from "by<EOF>").

Rather than trying to match these in the lexer, take a look at the
discussion of keywords and identifiers in the documentation -- match
an identifier, then change the match for those identifiers that happen
to be keywords.

Hope this helps,
- Bryan