[antlr-interest] Re: identifier with space

Thu Oct 30 16:01:05 PST 2003

As Loring suggests, a hidden token stream would probably do the 
trick. Or, to avoid attaching the hidden tokens (and having to use 
that token class, and use the AST class or process the hidden tokens 
in between lex and parse so the hidden tokens can be dropped), you 
could keep the whitespace in the lexer, then use a token stream 
filter in between lex and parse that rolled together IDENT WS+ 
combinations and stripped WS not following an IDENT.

Or, could you perhaps recognise the spaces as part of the 
identifiers. Have something like (assuming for the example that 
idents are only lowercase letters):
IDENT_WITH_SPACES
    :
        ('a'..'z')+ 
        {
            $setType(testLiterals($getText, IDENT_WITH_SPACES));
        }
        (
            {
                 $getType == IDENT_WITH_SPACES // Don't do it if it's 
a keyword
            }?
            (WS)+
        )?
    ;

That's just off the top of my head, probably neglecting something. 
And that code is almost certainly wrong (especially the test literals 
part) but should give the idea. Allow whitespace on non-literal 
idents. Probably need to make the WS+ greedy to avoid ambiguity, 
though from memory Antlr should match early and work anyway.

Tom. 

--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> Lloyd--
> 
> Check out the "Token Streams" part of the ANTLR manual.  I think 
that 
> you can capture the whitespace as hidden tokens and then access 
that 
> for reconstructing the input.  I've not had occasion to use this 
> feature, but it was put in for just this purpose.
> 
> --Loring
> 
> 
> --- In antlr-interest at yahoogroups.com, "lloyd_from_far" <ld at g...> 
> wrote:
> > Hi Loring,
> > 
> > I do want to separate "A Field" (1 space) from "A  Field" (2 
spaces)
> > it's not my fault if I have to write a ADO.NET driver to a stupid 
& 
> > so-called "database"
> > 
> > anyway managing 1 space (and only one) is certainly better than 
no 
> > space at all !!
> > obviously I hit here a limitation of ANTLR, I guess I have to do 
as 
> > you suggested, would be better than nothing.
> > 
> > thanks for the feeback ;-)
> > 
> > --- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> 
wrote:
> > > --- In antlr-interest at yahoogroups.com, "lloyd_from_far" 
<ld at g...> 
> > > wrote:
> > > > sorry, my example was bad.
> > > > let parse this:
> > > > 
> > > > SELECT A Field With Name FROM ATable
> > > > 
> > > 
> > > Lloyd--
> > > 
> > > You're trying to do too much in the lexer--spaces are 
significant 
> > for 
> > > separating tokens in your example.  If you really want "A Field 
> > With 
> > > Name" as a single AST node, you are probably better off 
> > reconstructing 
> > > it:
> > > 
> > > select
> > >     :
> > >     "SELECT" text "FROM" text
> > >     ;
> > > 
> > > text
> > > { String foo }
> > >     :
> > >     a:IDENTIFIER { foo = $a.getText(); }
> > >     { b:IDENTIFIER! { foo += " " + $b.getText(); } )*
> > >     { $a.setText(foo); }
> > >     ;
> > > 
> > > That also has the advantage of converting text to a canonical 
form 
> > > with single spaces--you really don't want "A    field" to be 
> > different 
> > > than "A field", do you?
> > >     
> > > --Loring

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/