[antlr-interest] Understanding priorities in lexing (newbie)

Thu Jul 12 13:11:42 PDT 2007

At 07:46 13/07/2007, Terence Parr wrote:
 >Hi Tom.  Actually even if I did, OTHER OTHER matches 'ab' as
 >does KEYWORD and so it has to resolve the ambiguity, which it 
does in
 >favor of first rule specified.

The point is that 'ab' *doesn't* match KEYWORD -- except in the 
mind of the predictor, since it isn't checking the whole rule.  So 
an input of 'ab' should unambigously result in OTHER OTHER; an 
input of 'abc' *could* result in either OTHER OTHER OTHER or 
KEYWORD, but the normal "pick the longest match and/or the first 
listed" rules sort out that ambiguity.

In the current implementation, though, the predictor sees 'ab' and 
immediately declares "That must be a KEYWORD!" -- even when the 
input is actually 'aba', whose only "correct" output would be 
OTHER OTHER OTHER.  So this results in an exception rather than 
producing the right output.

 >It uses PROGRAM rule w/o the + because what if you had an error
 >char?

I'm not sure what you meant by this.

 >There is an implied loop to PROGRAM in nextToken() method.

But the predictor doesn't know anything about it -- hence the 
problem.

This whole thing makes it really hard to write correct lexers -- 
especially since ANTLR also seems to ignore predicates if it 
thinks it knows better.  If this one thing was fixed then it'd 
make ANTLR significantly easier to use.  And I've been saying that 
for ages now :)