[antlr-interest] Re: Antlr 3.0 spaces between tokens

lgcraymer lgc at mail1.jpl.nasa.gov
Wed Nov 10 22:51:32 PST 2004



The min/max of ASTMinMax gives you an index into the token stream. 
Look for neighboring whitespace tokens.  By carrying the token stream
index around, you carry around references to associated whitespace. 
It's a rather clever trick for solving the whitespace tracking problem.

--Loring

--- In antlr-interest at yahoogroups.com, "matthew ford"
<Matthew.Ford at f...> wrote:
> Perhaps I am missing the point of the that article, but in my case I
don't
> what to just keep the whitespace for printing.
> 
> For some (not all) parser rules,  whitespace is actually important
for the
> parsing.
> So I want the parser to see all the whitespace for some rules and
not others
> 
> So what I want is the Token.SKIP option on the parser side instead
of on the
> lexer side and controlled on a rule basis.
> 
> matthew
> 
> ----- Original Message ----- 
> From: "lgcraymer" <lgc at m...>
> To: <antlr-interest at yahoogroups.com>
> Sent: Thursday, November 11, 2004 5:32 PM
> Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> 
> 
> >
> >
> > Take a look at
> >
>
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.
> tml>
> >
> > It's hard to see how ANTLR 3 could do better.
> >
> > --Loring
> >
> > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > <Matthew.Ford at f...> wrote:
> > > Hi Ter,
> > >
> > > Perhaps for Antlr 3.0 we can have a better means of handling white
> > space.
> > >
> > > Antlr provides an ignore whitespace capability that is appealing
> > > WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
> > >      { $setType(Token.SKIP); }
> > >    ;but every time I try and use it I come across a situation where
> > I really
> > > want/need the white space in the parser.
> > >
> > > So I end up having the lexer pass it back to the parser.
> > > (or have switch in the lexer that the parser uses to control the
> > return of
> > > whitespace.  I know this is a no-no but it has worked for me in some
> > cases)
> > >
> > > The parser usually only needs to know about the whitespace in a few
> > rules
> > > but now has
> > > (WS)* all over the place to handle whitespace every where.
> > >
> > > Basically what I would like to have
> > >  the lexer pass all the whitespace back to the parser) and then
in the
> > > parser be able to say
> > > a) for this rule ignore white space.
> > > or
> > > b) for this rule whitespace is important
> > >
> > > Actually the second option is more likely.
> > >
> > > matthew
> > >
> > > ----- Original Message ----- 
> > > From: "Monty Zukowski" <monty at c...>
> > > To: <antlr-interest at yahoogroups.com>
> > > Sent: Thursday, November 11, 2004 3:38 AM
> > > Subject: Re: [antlr-interest] spaces between tokens
> > >
> > >
> > > >
> > > > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> > > >
> > > > >
> > > > > silverio.di at q... wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Hi,
> > > > >> I've a big problem.
> > > > >>
> > > > >> In my grammar, how in many others, the whitespaces are
skipped in
> > > > >> lexer,
> > > > >> but I've some circumstances in which I need to check that
not any
> > > > >> spaces
> > > > >> are present between tokens.
> > > > >>
> > > > >> Example :
> > > > >> WeekJobHour at Monday = 8
> > > > >>
> > > > >> would mean assign 8 (hours) to parameter Monday of structure
> > > > >> WeekJobHour.
> > > > >> I would like my lexer extract following tokens:
> > > > >>
> > > > >> IDENT ATSIGN IDENT
> > > > >>
> > > > >> but my problem is to check than not any WS are present between
> > > > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > > > >>
> > > > >> WeekJobHour at Monday = 8        // is OK
> > > > >> WeekJobHour @Monday = 8       // is BAD
> > > > >> WeekJobHour@ Monday = 8       // is BAD
> > > > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > > > >>
> > > > >> I could use following lexer rule:
> > > > >>
> > > > >> STRUCT_PARAMETER
> > > > >>       :     ('A'..'Z' | 'a..z')+
> > > > >>             '@'
> > > > >>             ('A'..'Z' | 'a..z')+
> > > > >>       ;
> > > > >>
> > > > >> but in parser how can I extract the structure name
(WeekJobHour)
> > > > >> and the structure parameter (Monday) form STRUCT_PARAMETER
> > > > >> token ?
> > > > >>
> > > > >> I think a similar issue is present in C/C++ structure construct
> > > > >>
> > > > >> Thank you for your suggestions about
> > > > >> Silverio Diquigiovanni
> > > > > Make a class wich implements TokenStream wich uses the Lexer.
> > > > > In the nextToken method, if the lexer returns a token of type
> > > > > STRUCT_PARAM, split the token in 3 tokens where the first
would be
> > > > > of type STRUCT_NAME the second STRUCT_AT and the third
STRUCT_DAY
> > > > > and the text of the tokens WeekJobHour, @, Monday respectively.
> > > > > return the first token from the method and store the other 2.
> > > > > In the next 2 calls of nextToken return the stored ones.
> > > > >
> > > > > Pass the implementor of TokenStream instead of your Lexer to the
> > > > > parser.
> > > > >
> > > > > Anakreon
> > > > >
> > > >
> > > > I agree with the above approach, and also read my ParserFilter
> > paper on
> > > > my website, http://www.codetransform.com/filterexample.html
> > > >
> > > > I would recommend an alternative approach, which would be to
not skip
> > > > whitespace in the lexer.  Instead, discard it in the parser
filter.
> > > > That filter can still check that no whitespace occurs before
or after
> > > > an @ between IDENTS.
> > > >
> > > > Alternately you could keep track of state in the lexer.  Set a
boolean
> > > > variable in the makeToken() method if the token made was WS. 
To see
> > > > what is coming after, inspect LA(1).  Assuming @ is not used
in any
> > > > other way, you would have a rule similar to this, where
> > > > previousWasWhitespace is the variable set in makeToken().
> > > >
> > > > AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }?
'@' ;
> > > >
> > > > Monty
> > > >
> > > > ANTLR & Java Consultant -- http://www.codetransform.com
> > > > ANSI C/GCC transformation toolkit -- 
> > > > http://www.codetransform.com/gcc.html
> > > > Embrace the Decay --
http://www.codetransform.com/EmbraceDecay.html
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Yahoo! Groups Links
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list