[antlr-interest] Understanding priorities in lexing (newbie)

Terence Parr parrt at cs.usfca.edu
Thu Jul 12 13:20:27 PDT 2007


On Jul 12, 2007, at 1:11 PM, Gavin Lambert wrote:

> At 07:46 13/07/2007, Terence Parr wrote:
> >Hi Tom.  Actually even if I did, OTHER OTHER matches 'ab' as
> >does KEYWORD and so it has to resolve the ambiguity, which it does in
> >favor of first rule specified.
>
> The point is that 'ab' *doesn't* match KEYWORD -- except in the  
> mind of the predictor, since it isn't checking the whole rule.  So  
> an input of 'ab' should unambigously result in OTHER OTHER; an  
> input of 'abc' *could* result in either OTHER OTHER OTHER or  
> KEYWORD, but the normal "pick the longest match and/or the first  
> listed" rules sort out that ambiguity.

Yes.  ANTLR doesn't do the natural thing here.  For normal cases,  
it's not an issue.  Few tokens are prefixes like that.  Normally it's  
keyword against 'a'..'z'+ not 'a'..'z'.

> In the current implementation, though, the predictor sees 'ab' and  
> immediately declares "That must be a KEYWORD!" -- even when the  
> input is actually 'aba', whose only "correct" output would be OTHER  
> OTHER OTHER.  So this results in an exception rather than producing  
> the right output.

Well, it does what I expected so it's "correct", just not what you  
want ;)

> >It uses PROGRAM rule w/o the + because what if you had an error
> >char?
>
> I'm not sure what you meant by this.

I create Tokens : T1 | T2 | T3 ... ;

for tokens to do matching.

> >There is an implied loop to PROGRAM in nextToken() method.
>
> But the predictor doesn't know anything about it -- hence the problem.

It assume any char because that is correct.  You could put any char  
after a token, yes?

> This whole thing makes it really hard to write correct lexers --  
> especially since ANTLR also seems to ignore predicates if it thinks  
> it knows better.  If this one thing was fixed then it'd make ANTLR  
> significantly easier to use.

(...)=> and {...}?=> will always be executed.

> And I've been saying that for ages now :)

And not reading about {...}?=> ? ;)  They should always be executed.

Ter



More information about the antlr-interest mailing list