[antlr-interest] Lexer lookahead overoptimizes
shmuel siegel
antlr at shmuelhome.mine.nu
Fri Apr 13 06:30:34 PDT 2007
Jim Idle wrote:
> I think that what Ter is trying to tell you is that you are not really
> supplying quite enough information for the lexer analyser to work things
> out without making a 'mistake', so the behavior, without any further
> information, is as you see it.
>
> I think that you need a predicate on your rule, such as this:
>
> SHIN : '\u00d7' '\u00a9' ( ('\u00d7' '\u0081')=> ('\u00d7' '\u0081'))? '
>
> You might need the very latest snapshot for this predicate, but probably
> not.
>
> Jim
>
>
I understand what Ter is saying; that is why I referred to it as a
feature that I disagree with rather than a bug. I think that Ter is
making the mistake of having implementation issues drive functional
specifications. To my mind, EBNF '?' means optional, and optional
clauses can't fire recognition exceptions. In the notation that you have
used, Ter has essentially defined
('\u00d7' '\u0081')?
as
( ('\u00d7')=> ('\u00d7' '\u0081'))?
I don't think that that matches anybody's expectation. The way I look at
it, Ter has restricted my usage of '?' to single elements, otherwise,
its behavior is unpredictable.
From a practical point of view, I will get around the problem by
promoting my optional term to the status of a full token and letting the
parser deal with the optional nature.
The bottom line is that I think that Ter needs to document '?' very
carefully, both in his book and in the Wiki, if he expects to not run
into a lot of problems. This will be just as bad as ANTLR2's linear
approximate look ahead! Of course, by definition, Ter wins this debate.
Shmuel
More information about the antlr-interest
mailing list