[antlr-interest] Problems with lexing tokens containing blanks

Wed Nov 29 13:14:33 PST 2006

Terence,
Putting the INDEX_OF rule first doesn't seem to do the trick for me.  For 
instance, the full lexer grammar:

lexer grammar testgrammarlexer;

INDEX_OF :      'index of' ;
INDEX   :       'index' ;

NEWLINE :   (('\r')? '\n' )+ ;
ID      : ( 'A' .. 'Z' | '0' .. '9') ( 'A' .. 'Z' | 'a' .. 'z' | '0' .. 
'9')*; 
WS      :       (' '|'\t')+ {$channel=HIDDEN;};

Still generates the mTOKENS section that checks for 'i' 'n' 'd' 'e' 'x' ' 
', at which point it assumes the token is 'index of'.  In detail, it 
generates this:
    public void mTokens() throws RecognitionException {
        int alt5=5;
        switch ( input.LA(1) ) {
        case 'i':
            int LA5_1 = input.LA(2);
            if ( (LA5_1=='n') ) {
                int LA5_5 = input.LA(3);
                if ( (LA5_5=='d') ) {
                    int LA5_6 = input.LA(4);
                    if ( (LA5_6=='e') ) {
                        int LA5_7 = input.LA(5);
                        if ( (LA5_7=='x') ) {
                            int LA5_8 = input.LA(6);
                            if ( (LA5_8==' ') ) {
                                alt5=1; <- INDEX_OF
                            }
                            else {
                                alt5=2;} <- INDEX
                        }

I've run into this issue in other ways for my grammar, and even if putting 
INDEX_OF as the first rule did work, what if you're not directly creating 
a lexer rule for each multi word keyword (that is, just referencing the 
keywords in the parser rules like 'index of' and 'index')?  Do all of the 
parser rules therefore need to be in the proper order to generate the 
correct lexer?  Sometimes this is not possible, and likely not desired.

Do lexer predicates need to be used, or perhaps a fixed lookahead (of at 
least 7 in this case)?

Thanks,
-Ryan

Terence Parr <parrt at cs.usfca.edu> 
Sent by: antlr-interest-bounces at antlr.org
11/29/2006 02:22 PM

To
ANTLR Interest <antlr-interest at antlr.org>
cc

Subject
Re: [antlr-interest] Problems with lexing tokens containing blanks

On Nov 29, 2006, at 8:44 AM, Bernd Vogt wrote:

> Hi all,
>
> in my current project I have the requirement to lex some tokens 
> like this:
>
> lexer grammar ExampleLexer;
> ?
> INDEX : 'index' ;
> INDEX_OF : 'index of' ;
> INT : '0' | '1'..'9' '0'..'9'* ;

Hi, try putting

INDEX_OF : 'index of' ;

before INDEX.

Ter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061129/2d2bdc4f/attachment.html