[antlr-interest] Lexer lookahead overoptimizes

Fri Apr 13 06:30:34 PDT 2007

Jim Idle wrote:
> I think that what Ter is trying to tell you is that you are not really
> supplying quite enough information for the lexer analyser to work things
> out without making a 'mistake', so the behavior, without any further
> information, is as you see it.
>
> I think that you need a predicate on your rule, such as this:
>
> SHIN : '\u00d7' '\u00a9' ( ('\u00d7' '\u0081')=> ('\u00d7' '\u0081'))? '
>
> You might need the very latest snapshot for this predicate, but probably
> not. 
>
> Jim
>
>   
I understand what Ter is saying; that is why I referred to it as a 
feature that I disagree with rather than a bug. I think that Ter is 
making the mistake of having implementation issues drive functional 
specifications. To my mind, EBNF '?' means optional, and optional 
clauses can't fire recognition exceptions. In the notation that you have 
used, Ter has essentially defined

('\u00d7' '\u0081')?

 as

( ('\u00d7')=> ('\u00d7' '\u0081'))?

I don't think that that matches anybody's expectation. The way I look at 
it, Ter has restricted my usage of '?' to single elements, otherwise, 
its behavior is unpredictable.
 From a practical point of view, I will get around the problem by 
promoting my optional term to the status of a full token and letting the 
parser deal with the optional nature.
The bottom line is that I think that Ter needs to document '?' very 
carefully, both in his book and in the Wiki, if he expects to not run 
into a lot of problems. This will be just as bad as ANTLR2's linear 
approximate look ahead! Of course, by definition, Ter wins this debate.

Shmuel