[antlr-interest] bug in 3.0b6: identifier/keyword or underscore problem?

Mon Feb 26 05:03:12 PST 2007

Hello again,

I have found a way to circumvent the lexer problem in antlr 3.0b6 (see my 
previous post). Essentially, I found that by testing for literals by hand 
in the lexer solves the problem of recognizing between "int" (a 
keyword) and "int_something" (an identifier). Here is the lexer with 
literal recognition inside IDENTIFIER:

lexer grammar DUMMY_Lexer;
options { filter=true; }
tokens {
   INT;
}
SEMI         : ';' ;
WS           :  (  ' '| '\t'| '\r' | '\n' )+ {$channel=HIDDEN;} ;

IDENTIFIER   : ('int' WS) => 'int' { $type=INT; }
                | ('a'..'z'|'A'..'Z'|'_')+ ;

Using the same parser and input text as in my previous post, this
works as expected. Now the question is: does this scale well?

I am sure there is an easier way that eludes me. I still think there is a 
bug though.

Martin