[antlr-interest] Re: simple question

Thu Oct 30 05:23:21 PST 2003

Well, you are right, I did misunderstand your intentions indeed.

Back to the problem:
SELECT A field with spaces FROM ...

Apart from Loring's suggestion I would maybe try including WS as 
tokens (unless your grammar is pretty complex) and deal with them in 
the parser then. Just a guess for k=2 (based on your example):

WS : (SPACE | NEWLINE)+	;
protected SPACE : ' ' | '\t' ;
protected NEWLINE : ( '\r' (options{greedy=true;}: '\n')? | '\n' ) 
{newline();} ;

select {string tokText;}:
SELECT WS tokText=idname{...} (COMMA tokText=idname {...})* WS 
FROM ...;

idname returns [string s] {s = "";}: t:ID { s += t.getText(); }
(options{greedy=true;}: ws:WS { s += ws.getText(); } t2:ID! { s += 
t2.getText(); } )*;

Regards,
Lubos.
P.S.: Most of SQL implementations have this sort of identifiers 
as "delimited identifiers" and require that they be enclosed in 
double quotes, which makes tokenizing much easier.

--- In antlr-interest at yahoogroups.com, "lloyd_from_far" <ld at g...> 
wrote:
> I just understand you: you completely misunderstood me
> I absolutely NOT counfound parser & lexer in this particular case 
> (but YOU did)
> I actually already do
> select: SELECT (ID)+ FROM
> 
> the problem is ' ' is NOT a separator (which is ','), it's part of 
> the name of the ID !!!
> 
> I did thougt to do reconstruct the real ID with something like that 
> (and this time, blurring the functionality of lexer or parser, as 
> you unknowingly suggested)
> 
> select
> {
>  string tokText;
> }:
>   SELECT tokText=idname{...} (COMMA tokText=idname {...})* FROM ...;
> 
> idname [return s]
> {
> string s = "";
> }:
>   t:ID { s += t.getText(); }
>   (
>     t2:ID! { s += t.getText(); }
>   )*
> 
> 
> but I still don't have the exact number of spaces ... 
> it could be 2 !... (or 3 ?! or more ?!)
> 
> I thought to deactivate space skipping, but I did it once and 
> quickly forget it ... (too awfull)
> 
> 
> --- In antlr-interest at yahoogroups.com, "Lubos Vnuk" 
> <lubos.vnuk at r...> wrote:
> > You might be mixing up the meaning of lexer and parser somewhat.
> > 
> > I suggest you should get a single SELECT token, then a series of 
> ID 
> > tokens (ignoring the WS in the lexer), then a FROM token and an 
ID 
> > token. This would be the lexer's task.
> > 
> > In your parser you define a rule to put it together, something 
> like 
> > this:
> > select_stmt: SELECT (ID)+ FROM ID;
> > 
> > I think you have a few SQL grammars at www.antlr.org to study 
from.
> > 
> > HTH,
> > Lubos.
> > 
> > --- In antlr-interest at yahoogroups.com, "lloyd_from_far" <ld at g...> 
> > wrote:
> > > given this (or change this tokens as you see fit):
> > > SELECT: "SELECT" ;
> > > FROM: "FROM" ;
> > > NAME: options { testLiterals=true; }:
> > > 	( 'a' .. 'z' );
> > > SPACE: (' ') +;
> > > 
> > > how would you cut the following string:
> > > "SELECT a field name with plenty of space FROM aTable"
> > > 
> > > into the 4 following Tokens:
> > > 'SELECT'
> > > 'a field name with plenty of space'
> > > 'FROM'
> > > 'aTable'

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/