[antlr-interest] Re: Antlr 3.0 spaces between tokens

Wed Nov 10 22:12:35 PST 2004

Hi Ter,

Perhaps for Antlr 3.0 we can have a better means of handling white space.

Antlr provides an ignore whitespace capability that is appealing
WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
     { $setType(Token.SKIP); }
   ;but every time I try and use it I come across a situation where I really
want/need the white space in the parser.

So I end up having the lexer pass it back to the parser.
(or have switch in the lexer that the parser uses to control the return of
whitespace.  I know this is a no-no but it has worked for me in some cases)

The parser usually only needs to know about the whitespace in a few rules
but now has
(WS)* all over the place to handle whitespace every where.

Basically what I would like to have
 the lexer pass all the whitespace back to the parser) and then in the
parser be able to say
a) for this rule ignore white space.
or
b) for this rule whitespace is important

Actually the second option is more likely.

matthew

----- Original Message ----- 
From: "Monty Zukowski" <monty at codetransform.com>
To: <antlr-interest at yahoogroups.com>
Sent: Thursday, November 11, 2004 3:38 AM
Subject: Re: [antlr-interest] spaces between tokens

>
> On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
>
> >
> > silverio.di at qem.it wrote:
> >>
> >>
> >>
> >>
> >> Hi,
> >> I've a big problem.
> >>
> >> In my grammar, how in many others, the whitespaces are skipped in
> >> lexer,
> >> but I've some circumstances in which I need to check that not any
> >> spaces
> >> are present between tokens.
> >>
> >> Example :
> >> WeekJobHour at Monday = 8
> >>
> >> would mean assign 8 (hours) to parameter Monday of structure
> >> WeekJobHour.
> >> I would like my lexer extract following tokens:
> >>
> >> IDENT ATSIGN IDENT
> >>
> >> but my problem is to check than not any WS are present between
> >> IDENT and ATSIGN and between ATSIGN and IDENT so
> >>
> >> WeekJobHour at Monday = 8        // is OK
> >> WeekJobHour @Monday = 8       // is BAD
> >> WeekJobHour@ Monday = 8       // is BAD
> >> WeekJobHour  @ Monday = 8           // is BAD too !
> >>
> >> I could use following lexer rule:
> >>
> >> STRUCT_PARAMETER
> >>       :     ('A'..'Z' | 'a..z')+
> >>             '@'
> >>             ('A'..'Z' | 'a..z')+
> >>       ;
> >>
> >> but in parser how can I extract the structure name (WeekJobHour)
> >> and the structure parameter (Monday) form STRUCT_PARAMETER
> >> token ?
> >>
> >> I think a similar issue is present in C/C++ structure construct
> >>
> >> Thank you for your suggestions about
> >> Silverio Diquigiovanni
> > Make a class wich implements TokenStream wich uses the Lexer.
> > In the nextToken method, if the lexer returns a token of type
> > STRUCT_PARAM, split the token in 3 tokens where the first would be
> > of type STRUCT_NAME the second STRUCT_AT and the third STRUCT_DAY
> > and the text of the tokens WeekJobHour, @, Monday respectively.
> > return the first token from the method and store the other 2.
> > In the next 2 calls of nextToken return the stored ones.
> >
> > Pass the implementor of TokenStream instead of your Lexer to the
> > parser.
> >
> > Anakreon
> >
>
> I agree with the above approach, and also read my ParserFilter paper on
> my website, http://www.codetransform.com/filterexample.html
>
> I would recommend an alternative approach, which would be to not skip
> whitespace in the lexer.  Instead, discard it in the parser filter.
> That filter can still check that no whitespace occurs before or after
> an @ between IDENTS.
>
> Alternately you could keep track of state in the lexer.  Set a boolean
> variable in the makeToken() method if the token made was WS.  To see
> what is coming after, inspect LA(1).  Assuming @ is not used in any
> other way, you would have a rule similar to this, where
> previousWasWhitespace is the variable set in makeToken().
>
> AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }? '@' ;
>
> Monty
>
> ANTLR & Java Consultant -- http://www.codetransform.com
> ANSI C/GCC transformation toolkit -- 
> http://www.codetransform.com/gcc.html
> Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/