[antlr-interest] Re: Antlr 3.0 spaces between tokens

lgcraymer lgc at mail1.jpl.nasa.gov
Wed Nov 10 22:32:03 PST 2004



Take a look at
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.tml>

It's hard to see how ANTLR 3 could do better.

--Loring

--- In antlr-interest at yahoogroups.com, "matthew ford"
<Matthew.Ford at f...> wrote:
> Hi Ter,
> 
> Perhaps for Antlr 3.0 we can have a better means of handling white
space.
> 
> Antlr provides an ignore whitespace capability that is appealing
> WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
>      { $setType(Token.SKIP); }
>    ;but every time I try and use it I come across a situation where
I really
> want/need the white space in the parser.
> 
> So I end up having the lexer pass it back to the parser.
> (or have switch in the lexer that the parser uses to control the
return of
> whitespace.  I know this is a no-no but it has worked for me in some
cases)
> 
> The parser usually only needs to know about the whitespace in a few
rules
> but now has
> (WS)* all over the place to handle whitespace every where.
> 
> Basically what I would like to have
>  the lexer pass all the whitespace back to the parser) and then in the
> parser be able to say
> a) for this rule ignore white space.
> or
> b) for this rule whitespace is important
> 
> Actually the second option is more likely.
> 
> matthew
> 
> ----- Original Message ----- 
> From: "Monty Zukowski" <monty at c...>
> To: <antlr-interest at yahoogroups.com>
> Sent: Thursday, November 11, 2004 3:38 AM
> Subject: Re: [antlr-interest] spaces between tokens
> 
> 
> >
> > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> >
> > >
> > > silverio.di at q... wrote:
> > >>
> > >>
> > >>
> > >>
> > >> Hi,
> > >> I've a big problem.
> > >>
> > >> In my grammar, how in many others, the whitespaces are skipped in
> > >> lexer,
> > >> but I've some circumstances in which I need to check that not any
> > >> spaces
> > >> are present between tokens.
> > >>
> > >> Example :
> > >> WeekJobHour at Monday = 8
> > >>
> > >> would mean assign 8 (hours) to parameter Monday of structure
> > >> WeekJobHour.
> > >> I would like my lexer extract following tokens:
> > >>
> > >> IDENT ATSIGN IDENT
> > >>
> > >> but my problem is to check than not any WS are present between
> > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > >>
> > >> WeekJobHour at Monday = 8        // is OK
> > >> WeekJobHour @Monday = 8       // is BAD
> > >> WeekJobHour@ Monday = 8       // is BAD
> > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > >>
> > >> I could use following lexer rule:
> > >>
> > >> STRUCT_PARAMETER
> > >>       :     ('A'..'Z' | 'a..z')+
> > >>             '@'
> > >>             ('A'..'Z' | 'a..z')+
> > >>       ;
> > >>
> > >> but in parser how can I extract the structure name (WeekJobHour)
> > >> and the structure parameter (Monday) form STRUCT_PARAMETER
> > >> token ?
> > >>
> > >> I think a similar issue is present in C/C++ structure construct
> > >>
> > >> Thank you for your suggestions about
> > >> Silverio Diquigiovanni
> > > Make a class wich implements TokenStream wich uses the Lexer.
> > > In the nextToken method, if the lexer returns a token of type
> > > STRUCT_PARAM, split the token in 3 tokens where the first would be
> > > of type STRUCT_NAME the second STRUCT_AT and the third STRUCT_DAY
> > > and the text of the tokens WeekJobHour, @, Monday respectively.
> > > return the first token from the method and store the other 2.
> > > In the next 2 calls of nextToken return the stored ones.
> > >
> > > Pass the implementor of TokenStream instead of your Lexer to the
> > > parser.
> > >
> > > Anakreon
> > >
> >
> > I agree with the above approach, and also read my ParserFilter
paper on
> > my website, http://www.codetransform.com/filterexample.html
> >
> > I would recommend an alternative approach, which would be to not skip
> > whitespace in the lexer.  Instead, discard it in the parser filter.
> > That filter can still check that no whitespace occurs before or after
> > an @ between IDENTS.
> >
> > Alternately you could keep track of state in the lexer.  Set a boolean
> > variable in the makeToken() method if the token made was WS.  To see
> > what is coming after, inspect LA(1).  Assuming @ is not used in any
> > other way, you would have a rule similar to this, where
> > previousWasWhitespace is the variable set in makeToken().
> >
> > AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }? '@' ;
> >
> > Monty
> >
> > ANTLR & Java Consultant -- http://www.codetransform.com
> > ANSI C/GCC transformation toolkit -- 
> > http://www.codetransform.com/gcc.html
> > Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list