[antlr-interest] Lexer makes 2 valid tokens when there is only 1 invalid one

Mon Apr 14 20:07:35 PDT 2003

I believe I have a reasonably standard lexer for the SQL language, a 
language in which all identifiers have to begin with an alpha. It 
therefore correctly identifies "W123" as an identifier, however, if I 
give it "123W" the lexer figures there are two tokens: "123" (a 
NUMBER) and "W" (an IDENTIFIER). This is wrong, it should reject this 
(and because by chance this can be valid at the syntactic level, the 
parser cannot do anything about it). So what am I doing wrong. A 
fragment of my lexer follows:

Many thanks
Martin Braid

protected
DIGIT    : ('0'..'9');

protected
LETTER   : ('a'..'z');

protected
SPECIAL  : "_" ;

protected
EXPONENT : "e" ( PLUS | MINUS )? (DIGIT)+ ;

protected
INTEGER : (DIGIT)+;

protected
FLOAT  : (INTEGER '.' INTEGER) => INTEGER '.' INTEGER EXPONENT)?
       | (INTEGER '.'        ) => INTEGER '.'         (EXPONENT)?
       | (        '.' INTEGER) =>         '.' INTEGER (EXPONENT)?
       ;

NUMBER :  (FLOAT) => FLOAT   {$setType(FLOAT);}
       |  INTEGER {$setType(INTEGER);}
       |  '.'     {$setType(DOT);}
       ;

IDENT   options {testLiterals = true;}
       : (LETTER) ( SPECIAL | LETTER | DIGIT )*;

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/