[antlr-interest] Re: Lexer makes 2 valid tokens when there is only 1 invalid one
bchagenbuch
bhagenbuch at didera.com
Wed Apr 16 14:03:29 PDT 2003
I agree with you that this isn't so easy, especially when you consider 123E3, etc. My
lexer has the same problem.
Perhaps we can both take consolation in the fact that Oracle9i sees both
SELECT 123 W ...
and
SELECT 123W ...
as if they were
SELECT 123 AS W ...
while PostgreSQL rejects them both with ' parse error at or near "w" '.
It appears to me that the SQL99 standard agrees with you: 123 and W are
<nondelimiter token>s and, hence, need whitespace between them.
--- In antlr-interest at yahoogroups.com, "martinkbraid" <mbraid at s...> wrote:
> I believe I have a reasonably standard lexer for the SQL language, a
> language in which all identifiers have to begin with an alpha. It
> therefore correctly identifies "W123" as an identifier, however, if I
> give it "123W" the lexer figures there are two tokens: "123" (a
> NUMBER) and "W" (an IDENTIFIER). This is wrong, it should reject this
> (and because by chance this can be valid at the syntactic level, the
> parser cannot do anything about it). So what am I doing wrong. A
> fragment of my lexer follows:
>
> Many thanks
> Martin Braid
>
> protected
> DIGIT : ('0'..'9');
>
> protected
> LETTER : ('a'..'z');
>
> protected
> SPECIAL : "_" ;
>
> protected
> EXPONENT : "e" ( PLUS | MINUS )? (DIGIT)+ ;
>
> protected
> INTEGER : (DIGIT)+;
>
> protected
> FLOAT : (INTEGER '.' INTEGER) => INTEGER '.' INTEGER EXPONENT)?
> | (INTEGER '.' ) => INTEGER '.' (EXPONENT)?
> | ( '.' INTEGER) => '.' INTEGER (EXPONENT)?
> ;
>
> NUMBER : (FLOAT) => FLOAT {$setType(FLOAT);}
> | INTEGER {$setType(INTEGER);}
> | '.' {$setType(DOT);}
> ;
>
> IDENT options {testLiterals = true;}
> : (LETTER) ( SPECIAL | LETTER | DIGIT )*;
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list