[antlr-interest] Re: Antlr 3.0 spaces between tokens

Wed Nov 10 22:42:43 PST 2004

Perhaps I am missing the point of the that article, but in my case I don't
what to just keep the whitespace for printing.

For some (not all) parser rules,  whitespace is actually important for the
parsing.
So I want the parser to see all the whitespace for some rules and not others

So what I want is the Token.SKIP option on the parser side instead of on the
lexer side and controlled on a rule basis.

matthew

----- Original Message ----- 
From: "lgcraymer" <lgc at mail1.jpl.nasa.gov>
To: <antlr-interest at yahoogroups.com>
Sent: Thursday, November 11, 2004 5:32 PM
Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens

>
>
> Take a look at
>
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.
tml>
>
> It's hard to see how ANTLR 3 could do better.
>
> --Loring
>
> --- In antlr-interest at yahoogroups.com, "matthew ford"
> <Matthew.Ford at f...> wrote:
> > Hi Ter,
> >
> > Perhaps for Antlr 3.0 we can have a better means of handling white
> space.
> >
> > Antlr provides an ignore whitespace capability that is appealing
> > WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
> >      { $setType(Token.SKIP); }
> >    ;but every time I try and use it I come across a situation where
> I really
> > want/need the white space in the parser.
> >
> > So I end up having the lexer pass it back to the parser.
> > (or have switch in the lexer that the parser uses to control the
> return of
> > whitespace.  I know this is a no-no but it has worked for me in some
> cases)
> >
> > The parser usually only needs to know about the whitespace in a few
> rules
> > but now has
> > (WS)* all over the place to handle whitespace every where.
> >
> > Basically what I would like to have
> >  the lexer pass all the whitespace back to the parser) and then in the
> > parser be able to say
> > a) for this rule ignore white space.
> > or
> > b) for this rule whitespace is important
> >
> > Actually the second option is more likely.
> >
> > matthew
> >
> > ----- Original Message ----- 
> > From: "Monty Zukowski" <monty at c...>
> > To: <antlr-interest at yahoogroups.com>
> > Sent: Thursday, November 11, 2004 3:38 AM
> > Subject: Re: [antlr-interest] spaces between tokens
> >
> >
> > >
> > > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> > >
> > > >
> > > > silverio.di at q... wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Hi,
> > > >> I've a big problem.
> > > >>
> > > >> In my grammar, how in many others, the whitespaces are skipped in
> > > >> lexer,
> > > >> but I've some circumstances in which I need to check that not any
> > > >> spaces
> > > >> are present between tokens.
> > > >>
> > > >> Example :
> > > >> WeekJobHour at Monday = 8
> > > >>
> > > >> would mean assign 8 (hours) to parameter Monday of structure
> > > >> WeekJobHour.
> > > >> I would like my lexer extract following tokens:
> > > >>
> > > >> IDENT ATSIGN IDENT
> > > >>
> > > >> but my problem is to check than not any WS are present between
> > > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > > >>
> > > >> WeekJobHour at Monday = 8        // is OK
> > > >> WeekJobHour @Monday = 8       // is BAD
> > > >> WeekJobHour@ Monday = 8       // is BAD
> > > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > > >>
> > > >> I could use following lexer rule:
> > > >>
> > > >> STRUCT_PARAMETER
> > > >>       :     ('A'..'Z' | 'a..z')+
> > > >>             '@'
> > > >>             ('A'..'Z' | 'a..z')+
> > > >>       ;
> > > >>
> > > >> but in parser how can I extract the structure name (WeekJobHour)
> > > >> and the structure parameter (Monday) form STRUCT_PARAMETER
> > > >> token ?
> > > >>
> > > >> I think a similar issue is present in C/C++ structure construct
> > > >>
> > > >> Thank you for your suggestions about
> > > >> Silverio Diquigiovanni
> > > > Make a class wich implements TokenStream wich uses the Lexer.
> > > > In the nextToken method, if the lexer returns a token of type
> > > > STRUCT_PARAM, split the token in 3 tokens where the first would be
> > > > of type STRUCT_NAME the second STRUCT_AT and the third STRUCT_DAY
> > > > and the text of the tokens WeekJobHour, @, Monday respectively.
> > > > return the first token from the method and store the other 2.
> > > > In the next 2 calls of nextToken return the stored ones.
> > > >
> > > > Pass the implementor of TokenStream instead of your Lexer to the
> > > > parser.
> > > >
> > > > Anakreon
> > > >
> > >
> > > I agree with the above approach, and also read my ParserFilter
> paper on
> > > my website, http://www.codetransform.com/filterexample.html
> > >
> > > I would recommend an alternative approach, which would be to not skip
> > > whitespace in the lexer.  Instead, discard it in the parser filter.
> > > That filter can still check that no whitespace occurs before or after
> > > an @ between IDENTS.
> > >
> > > Alternately you could keep track of state in the lexer.  Set a boolean
> > > variable in the makeToken() method if the token made was WS.  To see
> > > what is coming after, inspect LA(1).  Assuming @ is not used in any
> > > other way, you would have a rule similar to this, where
> > > previousWasWhitespace is the variable set in makeToken().
> > >
> > > AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }? '@' ;
> > >
> > > Monty
> > >
> > > ANTLR & Java Consultant -- http://www.codetransform.com
> > > ANSI C/GCC transformation toolkit -- 
> > > http://www.codetransform.com/gcc.html
> > > Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
> > >
> > >
> > >
> > >
> > >
> > > Yahoo! Groups Links
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/