[antlr-interest] v3 lexer cannot tell keyword from identifier (very strange)

Martin d'Anjou martin.danjou at neterion.com
Thu Feb 22 13:30:35 PST 2007


Hi,

I have a very strange problem in 3.0b6. Given the input text:

     int id;
     int int_id;

The error:

    line 2:4 mismatched input 'int' expecting IDENTIFIER

It is mistaking "int_id" for "int", treating the underscore as a token 
separator. The (ridiculous looking) lexer is:

    lexer grammar DUMMY_Lexer;
    options { filter=true; }

    MOD          : 'mod' ;
    END          : 'end' ;
    DEF          : 'def' ;
    INC          : 'inc' ;
    PAR          : 'par' ;
    INP          : 'inp' ;
    OUT          : 'out' ;
    INO          : 'ino' ;
    INT          : 'int' ;
    WER          : 'wer' ;
    COMMA        : ',' ;
    SEMI         : ';' ;
    L_PAREN      : '(' ;
    R_PAREN      : ')' ;
    ASSIGN       : '=' ;
    SHARP        : '#' ;
    LSHIFT       : '<<' ;
    MULT         : '*' ;
    MINUS        : '-' ;
    PLUS         : '+' ;
    COLON        : ':' ;
    LTEQ         : '<=' ;
    L_CURLY      : '{' ;
    R_CURLY      : '}' ;
    OR           : '|' ;
    SQUARE       :  '[]' ;
    QUOTE        :  '"' ;
    DIGIT        :  '0' ;
    WS           :  (  ' ' | EOL )+ {$channel=HIDDEN;} ;
    EOL          :  ('\r\n'|'\r'|'\n') ;
    LetterC      :  'c' |   Nothing ;
    Nothing      :   't' ;
    SL_COMMENT   :'a';
    ML_COMMENT   : '/' ;
    BASE         : 'b' ;
    BASE_NUM     : DIGIT+ (BASE DIGIT+)? ;

    IDENTIFIER   : ('a'..'z'|UNDERSCORE)+ ;

    fragment
    UNDERSCORE  :  '_' ;

The only token I was able to get out was the QUESTION : '?'; token. When I 
remove any other token (like MOD or other), the error changes to:

     line 1:0 required (...)+ loop did not match anything at input 'int'

Which makes it even weirder...

Now the parser is fairly minimal:

    parser grammar DUMMY_Parser;
    options {
      tokenVocab=DUMMY_Lexer;
    }

    source_text :
      int_defs+
      ;

    int_defs :
      INT            { System.out.print("int "); }
      id=IDENTIFIER  { System.out.print($id.text); }
      SEMI           { System.out.println(";"); }
    ;

Help!!! (and thanks!)
Martin


More information about the antlr-interest mailing list