[antlr-interest] Lexer lookahead overoptimizes

Thu Apr 12 15:28:42 PDT 2007

On Apr 12, 2007, at 3:26 PM, shmuel siegel wrote:

> Gavin Lambert wrote:
>>
>> (I don't know whether ANTLR is actually working this way or not,  
>> of course -- if it still doesn't work when you haven't specified  
>> any k= options then I would consider it a bug.)
>>
>
> Thanks for the confirmation of my sanity. But I wouldn't call it a  
> bug when it is doing exactly what Terence expects. It is more of a  
> "feature" that you and I disagree with. Anyway, try the following  
> grammar. It demonstrates that it is not honoring my desire that  
> ('\u00d7' '\u0081')? is optional when '\u00d7' matches.
>
> Terence, this has nothing to do with greedy/ not greedy. I would  
> not expect SHIN to throw an exception on the sequence '\u00d7'  
> '\u00a9' '\u00d7' '\u0035', although mTokens should throw a  
> noViableAltException since it doesn't know what to do with '\u00d7'  
> '\u0035'.

Well, ANTLR cannot know how much lookahead to use when you cannot see  
past the end of the token. consequently it merely sees the first  
symbol as the predictor.  Without information to the contrary, this  
is the only reasonable decision antlr can make. It is doing exactly  
as I intended it to do even if it is not what you want. ;) Did you  
try the k=2? on that subrule?  Actually that probably won't work.   
ANTLR will optimize it down. You will have use a syntax predicate on  
that last alternative I would say to indicate the context in which  
antlr should match the final optional subrule.

Ter