[antlr-interest] Re: Antlr 3.0 spaces between tokens

lgcraymer lgc at mail1.jpl.nasa.gov
Wed Nov 10 23:13:00 PST 2004



As usual--you ignore whitespace during parsing.  Then when you need
the whitespace around a token, you peek into the token stream around
the point of interest.  It doesn't help if whitespace is really a
syntax feature and not just a token separator.

--Loring


--- In antlr-interest at yahoogroups.com, "matthew ford"
<Matthew.Ford at f...> wrote:
> A bit too clever for me  how do you write the parser rules?
> matthew
> 
> ----- Original Message ----- 
> From: "lgcraymer" <lgc at m...>
> To: <antlr-interest at yahoogroups.com>
> Sent: Thursday, November 11, 2004 5:51 PM
> Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> 
> 
> >
> >
> > The min/max of ASTMinMax gives you an index into the token stream.
> > Look for neighboring whitespace tokens.  By carrying the token stream
> > index around, you carry around references to associated whitespace.
> > It's a rather clever trick for solving the whitespace tracking
problem.
> >
> > --Loring
> >
> > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > <Matthew.Ford at f...> wrote:
> > > Perhaps I am missing the point of the that article, but in my case I
> > don't
> > > what to just keep the whitespace for printing.
> > >
> > > For some (not all) parser rules,  whitespace is actually important
> > for the
> > > parsing.
> > > So I want the parser to see all the whitespace for some rules and
> > not others
> > >
> > > So what I want is the Token.SKIP option on the parser side instead
> > of on the
> > > lexer side and controlled on a rule basis.
> > >
> > > matthew
> > >
> > > ----- Original Message ----- 
> > > From: "lgcraymer" <lgc at m...>
> > > To: <antlr-interest at yahoogroups.com>
> > > Sent: Thursday, November 11, 2004 5:32 PM
> > > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> > >
> > >
> > > >
> > > >
> > > > Take a look at
> > > >
> > >
> >
>
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.
> > > tml>
> > > >
> > > > It's hard to see how ANTLR 3 could do better.
> > > >
> > > > --Loring
> > > >
> > > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > > <Matthew.Ford at f...> wrote:
> > > > > Hi Ter,
> > > > >
> > > > > Perhaps for Antlr 3.0 we can have a better means of handling
white
> > > > space.
> > > > >
> > > > > Antlr provides an ignore whitespace capability that is appealing
> > > > > WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
> > > > >      { $setType(Token.SKIP); }
> > > > >    ;but every time I try and use it I come across a
situation where
> > > > I really
> > > > > want/need the white space in the parser.
> > > > >
> > > > > So I end up having the lexer pass it back to the parser.
> > > > > (or have switch in the lexer that the parser uses to control the
> > > > return of
> > > > > whitespace.  I know this is a no-no but it has worked for me
in some
> > > > cases)
> > > > >
> > > > > The parser usually only needs to know about the whitespace
in a few
> > > > rules
> > > > > but now has
> > > > > (WS)* all over the place to handle whitespace every where.
> > > > >
> > > > > Basically what I would like to have
> > > > >  the lexer pass all the whitespace back to the parser) and then
> > in the
> > > > > parser be able to say
> > > > > a) for this rule ignore white space.
> > > > > or
> > > > > b) for this rule whitespace is important
> > > > >
> > > > > Actually the second option is more likely.
> > > > >
> > > > > matthew
> > > > >
> > > > > ----- Original Message ----- 
> > > > > From: "Monty Zukowski" <monty at c...>
> > > > > To: <antlr-interest at yahoogroups.com>
> > > > > Sent: Thursday, November 11, 2004 3:38 AM
> > > > > Subject: Re: [antlr-interest] spaces between tokens
> > > > >
> > > > >
> > > > > >
> > > > > > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> > > > > >
> > > > > > >
> > > > > > > silverio.di at q... wrote:
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Hi,
> > > > > > >> I've a big problem.
> > > > > > >>
> > > > > > >> In my grammar, how in many others, the whitespaces are
> > skipped in
> > > > > > >> lexer,
> > > > > > >> but I've some circumstances in which I need to check that
> > not any
> > > > > > >> spaces
> > > > > > >> are present between tokens.
> > > > > > >>
> > > > > > >> Example :
> > > > > > >> WeekJobHour at Monday = 8
> > > > > > >>
> > > > > > >> would mean assign 8 (hours) to parameter Monday of
structure
> > > > > > >> WeekJobHour.
> > > > > > >> I would like my lexer extract following tokens:
> > > > > > >>
> > > > > > >> IDENT ATSIGN IDENT
> > > > > > >>
> > > > > > >> but my problem is to check than not any WS are present
between
> > > > > > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > > > > > >>
> > > > > > >> WeekJobHour at Monday = 8        // is OK
> > > > > > >> WeekJobHour @Monday = 8       // is BAD
> > > > > > >> WeekJobHour@ Monday = 8       // is BAD
> > > > > > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > > > > > >>
> > > > > > >> I could use following lexer rule:
> > > > > > >>
> > > > > > >> STRUCT_PARAMETER
> > > > > > >>       :     ('A'..'Z' | 'a..z')+
> > > > > > >>             '@'
> > > > > > >>             ('A'..'Z' | 'a..z')+
> > > > > > >>       ;
> > > > > > >>
> > > > > > >> but in parser how can I extract the structure name
> > (WeekJobHour)
> > > > > > >> and the structure parameter (Monday) form STRUCT_PARAMETER
> > > > > > >> token ?
> > > > > > >>
> > > > > > >> I think a similar issue is present in C/C++ structure
construct
> > > > > > >>
> > > > > > >> Thank you for your suggestions about
> > > > > > >> Silverio Diquigiovanni
> > > > > > > Make a class wich implements TokenStream wich uses the
Lexer.
> > > > > > > In the nextToken method, if the lexer returns a token of
type
> > > > > > > STRUCT_PARAM, split the token in 3 tokens where the first
> > would be
> > > > > > > of type STRUCT_NAME the second STRUCT_AT and the third
> > STRUCT_DAY
> > > > > > > and the text of the tokens WeekJobHour, @, Monday
respectively.
> > > > > > > return the first token from the method and store the
other 2.
> > > > > > > In the next 2 calls of nextToken return the stored ones.
> > > > > > >
> > > > > > > Pass the implementor of TokenStream instead of your
Lexer to the
> > > > > > > parser.
> > > > > > >
> > > > > > > Anakreon
> > > > > > >
> > > > > >
> > > > > > I agree with the above approach, and also read my ParserFilter
> > > > paper on
> > > > > > my website, http://www.codetransform.com/filterexample.html
> > > > > >
> > > > > > I would recommend an alternative approach, which would be to
> > not skip
> > > > > > whitespace in the lexer.  Instead, discard it in the parser
> > filter.
> > > > > > That filter can still check that no whitespace occurs before
> > or after
> > > > > > an @ between IDENTS.
> > > > > >
> > > > > > Alternately you could keep track of state in the lexer.  Set a
> > boolean
> > > > > > variable in the makeToken() method if the token made was WS.
> > To see
> > > > > > what is coming after, inspect LA(1).  Assuming @ is not used
> > in any
> > > > > > other way, you would have a rule similar to this, where
> > > > > > previousWasWhitespace is the variable set in makeToken().
> > > > > >
> > > > > > AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }?
> > '@' ;
> > > > > >
> > > > > > Monty
> > > > > >
> > > > > > ANTLR & Java Consultant -- http://www.codetransform.com
> > > > > > ANSI C/GCC transformation toolkit -- 
> > > > > > http://www.codetransform.com/gcc.html
> > > > > > Embrace the Decay --
> > http://www.codetransform.com/EmbraceDecay.html
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Yahoo! Groups Links
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Yahoo! Groups Links
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list