[antlr-interest] Re: Please enlighten a new user..
Bryan Ewbank
ewbank at gmail.com
Fri Dec 23 10:51:28 PST 2005
<resend, to list...>
On 12/23/05, Mike Feldmeier <mfeldmeier at metadata.com> wrote:
> I have been using ANTLR for about three days now. I have read the reference
> guide and two tutorials, but I still don't understand this problem.
>
> A: "data";
> B: "database" | "by";
>
> I get the following warning (it seems to work, I just despise warnings):
>
> test.g: warning:lexical nondeterminism between rules A and B upon
> test.g: k==1:'d'
> test.g: k==2:'a'
> test.g: k==3:'t'
> test.g: k==4:'a'
> test.g: k==5:<end-of-token>
>
> Interestingly, if I take out the ''| "by"'' from rule B, the warning
> disappears.
I think the problem is linear approximate lookahead, coupled with the
fact that end-of-token is used as a placeholder past the end of the
input. The problem is that "by" at the end of the stream is
represented for k=5 as:
k==1:'b'
k==2:'y'
k==3:<end-of-token>
k==4:<end-of-token>
k==5:<end-of-token>
So that means rule B matches for:
k==1:'d' or 'b'
k==2:'a' or 'y'
k==3:'t' or <end-of-token>
k==4:'a' or <end-of-token>
k==5:'b' or <end-of-token>
Look at that last entry again, and remember these are sets. That
means that if I see "data<EOF>", rule A can obviously match, but rule
B is also a candidate ("data" from "database", followed by
<end-of-token> from "by<EOF>").
Rather than trying to match these in the lexer, take a look at the
discussion of keywords and identifiers in the documentation -- match
an identifier, then change the match for those identifiers that happen
to be keywords.
Hope this helps,
- Bryan
More information about the antlr-interest
mailing list