[antlr-interest] Re: Lexer makes 2 valid tokens when there is only 1 invalid one

Tue Apr 15 07:54:23 PDT 2003

Here is a valid SQL stmt: select 123 w from table;

Here is an invalid SQL stmt: select 123w from table;

In the first stmt, the "w" is a column alias for the constant "123"; 
in the 2nd stmt, "123w" is an invalid column name. My problem is that 
I need to weed out bad stmts, like the 2nd one, but I cannot do that 
if my lexer converts it to a valid stmt, like the first one. That's 
why the parser cannot capture this problem - it doesn't think there 
is one.

Thanks,
Martin

--- In antlr-interest at yahoogroups.com, "micheal_jor" <open.zone at v...> 
wrote:
> > I believe I have a reasonably standard lexer for the SQL 
language, 
> a 
> > language in which all identifiers have to begin with an alpha. It 
> > therefore correctly identifies "W123" as an identifier, however, 
if 
> I 
> > give it "123W" the lexer figures there are two tokens: "123" (a 
> > NUMBER) and "W" (an IDENTIFIER). This is wrong, it should reject 
> this 
> 
> The Lexer is working fine. It is tokenizing the stream of 
characters 
> presented to it accurately. The parser grammar should be 
responsible 
> for determining the validity of the tokens in whatever context they 
> occur during parsing.
> 
> > (and because by chance this can be valid at the syntactic level, 
> the 
> > parser cannot do anything about it).
> 
> Could you explain this further please?. Why do you believe the 
parser 
> can't do anything about it?. Perhaps examples of SQL text that 
> illustrates the issue....
> 
> Cheers,
> 
> Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/