[antlr-interest] Re: Lexer makes 2 valid tokens when there is only 1 invalid one

Tue Apr 15 08:21:35 PDT 2003

You can do something like this:

NUMBER: ('0'..'9')+ {isWhitespace(LA(1)}? ;

The idea is to use a validating predicate to throw an exception when you
don't see whitespace following a number.  Antlr sometimes doesn't like it
when you don't recognize things however, so you may need to recognize the
inversion of whitespace and then throw an exception.

Give it a try.  If you get stuck send another message, I might find some
time today to play with it.  

Monty

-----Original Message-----
From: martinkbraid [mailto:mbraid at sqlworks.com]
Sent: Tuesday, April 15, 2003 7:54 AM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Re: Lexer makes 2 valid tokens when there is
only 1 invalid one

Here is a valid SQL stmt: select 123 w from table;

Here is an invalid SQL stmt: select 123w from table;

In the first stmt, the "w" is a column alias for the constant "123"; 
in the 2nd stmt, "123w" is an invalid column name. My problem is that 
I need to weed out bad stmts, like the 2nd one, but I cannot do that 
if my lexer converts it to a valid stmt, like the first one. That's 
why the parser cannot capture this problem - it doesn't think there 
is one.

Thanks,
Martin

--- In antlr-interest at yahoogroups.com, "micheal_jor" <open.zone at v...> 
wrote:
> > I believe I have a reasonably standard lexer for the SQL 
language, 
> a 
> > language in which all identifiers have to begin with an alpha. It 
> > therefore correctly identifies "W123" as an identifier, however, 
if 
> I 
> > give it "123W" the lexer figures there are two tokens: "123" (a 
> > NUMBER) and "W" (an IDENTIFIER). This is wrong, it should reject 
> this 
> 
> The Lexer is working fine. It is tokenizing the stream of 
characters 
> presented to it accurately. The parser grammar should be 
responsible 
> for determining the validity of the tokens in whatever context they 
> occur during parsing.
> 
> > (and because by chance this can be valid at the syntactic level, 
> the 
> > parser cannot do anything about it).
> 
> Could you explain this further please?. Why do you believe the 
parser 
> can't do anything about it?. Perhaps examples of SQL text that 
> illustrates the issue....
> 
> Cheers,
> 
> Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/