[antlr-interest] Lexer lookahead overoptimizes

Jim Idle jimi at temporal-wave.com
Thu Apr 12 16:47:20 PDT 2007


I think that what Ter is trying to tell you is that you are not really
supplying quite enough information for the lexer analyser to work things
out without making a 'mistake', so the behavior, without any further
information, is as you see it.

I think that you need a predicate on your rule, such as this:

SHIN : '\u00d7' '\u00a9' ( ('\u00d7' '\u0081')=> ('\u00d7' '\u0081'))? '

You might need the very latest snapshot for this predicate, but probably
not. 

Jim

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of shmuel siegel
Sent: Thursday, April 12, 2007 3:26 PM
To: ANTLR Interest
Subject: Re: [antlr-interest] Lexer lookahead overoptimizes

Gavin Lambert wrote:
>
> (I don't know whether ANTLR is actually working this way or not, of 
> course -- if it still doesn't work when you haven't specified any k= 
> options then I would consider it a bug.)
>

Thanks for the confirmation of my sanity. But I wouldn't call it a bug 
when it is doing exactly what Terence expects. It is more of a "feature"

that you and I disagree with. Anyway, try the following grammar. It 
demonstrates that it is not honoring my desire that ('\u00d7' '\u0081')?

is optional when '\u00d7' matches.

Terence, this has nothing to do with greedy/ not greedy. I would not 
expect SHIN to throw an exception on the sequence '\u00d7' '\u00a9' 
'\u00d7' '\u0035', although mTokens should throw a noViableAltException 
since it doesn't know what to do with '\u00d7' '\u0035'.

grammar miqroh;
letter:    SHIN | BOO;

 SHIN:
    '\u00d7' '\u00a9' ('\u00d7' '\u0081')?;
 TUF:
    '\u00d7' '\u00aa';


I get the following for the SHIN rule
            match('\u00D7');
            match('\u00A9');
            // C:\\Documents and Settings\\shmuels\\My 
Documents\\miqroh.g:8:20: ( '\\u00d7' '\\u0081' )?
            int alt1=2;
            int LA1_0 = input.LA(1);
            if ( (LA1_0=='\u00D7') ) {
                alt1=1;
            }
            switch (alt1) {
                case 1 :
                    // C:\\Documents and Settings\\shmuels\\My 
Documents\\miqroh.g:8:21: '\\u00d7' '\\u0081'
                    {
                    match('\u00D7');
                    match('\u0081');

                    }
                    break;

            }




More information about the antlr-interest mailing list