[antlr-interest] Re: Antlr 3.0 spaces between tokens

matthew ford Matthew.Ford at forward.com.au
Thu Nov 11 02:25:54 PST 2004


Yes I did something much the same

But what about a  SKIP  in the parser side?
I cannot see any conceptual problem providing it.

matthew

----- Original Message ----- 
From: "lgcraymer" <lgc at mail1.jpl.nasa.gov>
To: <antlr-interest at yahoogroups.com>
Sent: Thursday, November 11, 2004 8:19 PM
Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens


>
>
> Context-dependent lexing is a nasty problem.  ANTLR 3 probably won't
> solve it.  I ran into exactly the same problem in an expression
> grammar for spacecraft sequencing.  The cleanest approach I could come
> up with was to have a counter that was incremented by LBRACKET and
> decremented by RBRACKET.  If the counter was zero, then whitespace
> tokens were marked "SKIP"; if it was positive, then they were "WS" and
> recognized by the parser.  That helped simplify the grammar.
>
> --Loring
>
>
> --- In antlr-interest at yahoogroups.com, "matthew ford"
> <Matthew.Ford at f...> wrote:
> > That is what I am talking about
> > whitespace as a syntax feature and not just a token separator.
> > This is usually only in a small number of rules
> > One example I had was a math language where whitespace was significant
> > inside
> > [ ] when indexing matrices but elsewhere it was just a token separator.
> >
> > matthew
> >
> > ----- Original Message ----- 
> > From: "lgcraymer" <lgc at m...>
> > To: <antlr-interest at yahoogroups.com>
> > Sent: Thursday, November 11, 2004 6:13 PM
> > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> >
> >
> > >
> > >
> > > As usual--you ignore whitespace during parsing.  Then when you need
> > > the whitespace around a token, you peek into the token stream around
> > > the point of interest.  It doesn't help if whitespace is really a
> > > syntax feature and not just a token separator.
> > >
> > > --Loring
> > >
> > >
> > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > <Matthew.Ford at f...> wrote:
> > > > A bit too clever for me  how do you write the parser rules?
> > > > matthew
> > > >
> > > > ----- Original Message ----- 
> > > > From: "lgcraymer" <lgc at m...>
> > > > To: <antlr-interest at yahoogroups.com>
> > > > Sent: Thursday, November 11, 2004 5:51 PM
> > > > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> > > >
> > > >
> > > > >
> > > > >
> > > > > The min/max of ASTMinMax gives you an index into the token stream.
> > > > > Look for neighboring whitespace tokens.  By carrying the token
> stream
> > > > > index around, you carry around references to associated
> whitespace.
> > > > > It's a rather clever trick for solving the whitespace tracking
> > > problem.
> > > > >
> > > > > --Loring
> > > > >
> > > > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > > > <Matthew.Ford at f...> wrote:
> > > > > > Perhaps I am missing the point of the that article, but in
> my case I
> > > > > don't
> > > > > > what to just keep the whitespace for printing.
> > > > > >
> > > > > > For some (not all) parser rules,  whitespace is actually
> important
> > > > > for the
> > > > > > parsing.
> > > > > > So I want the parser to see all the whitespace for some
> rules and
> > > > > not others
> > > > > >
> > > > > > So what I want is the Token.SKIP option on the parser side
> instead
> > > > > of on the
> > > > > > lexer side and controlled on a rule basis.
> > > > > >
> > > > > > matthew
> > > > > >
> > > > > > ----- Original Message ----- 
> > > > > > From: "lgcraymer" <lgc at m...>
> > > > > > To: <antlr-interest at yahoogroups.com>
> > > > > > Sent: Thursday, November 11, 2004 5:32 PM
> > > > > > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Take a look at
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.
> > > > > > tml>
> > > > > > >
> > > > > > > It's hard to see how ANTLR 3 could do better.
> > > > > > >
> > > > > > > --Loring
> > > > > > >
> > > > > > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > > > > > <Matthew.Ford at f...> wrote:
> > > > > > > > Hi Ter,
> > > > > > > >
> > > > > > > > Perhaps for Antlr 3.0 we can have a better means of handling
> > > white
> > > > > > > space.
> > > > > > > >
> > > > > > > > Antlr provides an ignore whitespace capability that is
> appealing
> > > > > > > > WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
> > > > > > > >      { $setType(Token.SKIP); }
> > > > > > > >    ;but every time I try and use it I come across a
> > > situation where
> > > > > > > I really
> > > > > > > > want/need the white space in the parser.
> > > > > > > >
> > > > > > > > So I end up having the lexer pass it back to the parser.
> > > > > > > > (or have switch in the lexer that the parser uses to
> control the
> > > > > > > return of
> > > > > > > > whitespace.  I know this is a no-no but it has worked for me
> > > in some
> > > > > > > cases)
> > > > > > > >
> > > > > > > > The parser usually only needs to know about the whitespace
> > > in a few
> > > > > > > rules
> > > > > > > > but now has
> > > > > > > > (WS)* all over the place to handle whitespace every where.
> > > > > > > >
> > > > > > > > Basically what I would like to have
> > > > > > > >  the lexer pass all the whitespace back to the parser)
> and then
> > > > > in the
> > > > > > > > parser be able to say
> > > > > > > > a) for this rule ignore white space.
> > > > > > > > or
> > > > > > > > b) for this rule whitespace is important
> > > > > > > >
> > > > > > > > Actually the second option is more likely.
> > > > > > > >
> > > > > > > > matthew
> > > > > > > >
> > > > > > > > ----- Original Message ----- 
> > > > > > > > From: "Monty Zukowski" <monty at c...>
> > > > > > > > To: <antlr-interest at yahoogroups.com>
> > > > > > > > Sent: Thursday, November 11, 2004 3:38 AM
> > > > > > > > Subject: Re: [antlr-interest] spaces between tokens
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > silverio.di at q... wrote:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Hi,
> > > > > > > > > >> I've a big problem.
> > > > > > > > > >>
> > > > > > > > > >> In my grammar, how in many others, the whitespaces are
> > > > > skipped in
> > > > > > > > > >> lexer,
> > > > > > > > > >> but I've some circumstances in which I need to
> check that
> > > > > not any
> > > > > > > > > >> spaces
> > > > > > > > > >> are present between tokens.
> > > > > > > > > >>
> > > > > > > > > >> Example :
> > > > > > > > > >> WeekJobHour at Monday = 8
> > > > > > > > > >>
> > > > > > > > > >> would mean assign 8 (hours) to parameter Monday of
> > > structure
> > > > > > > > > >> WeekJobHour.
> > > > > > > > > >> I would like my lexer extract following tokens:
> > > > > > > > > >>
> > > > > > > > > >> IDENT ATSIGN IDENT
> > > > > > > > > >>
> > > > > > > > > >> but my problem is to check than not any WS are present
> > > between
> > > > > > > > > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > > > > > > > > >>
> > > > > > > > > >> WeekJobHour at Monday = 8        // is OK
> > > > > > > > > >> WeekJobHour @Monday = 8       // is BAD
> > > > > > > > > >> WeekJobHour@ Monday = 8       // is BAD
> > > > > > > > > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > > > > > > > > >>
> > > > > > > > > >> I could use following lexer rule:
> > > > > > > > > >>
> > > > > > > > > >> STRUCT_PARAMETER
> > > > > > > > > >>       :     ('A'..'Z' | 'a..z')+
> > > > > > > > > >>             '@'
> > > > > > > > > >>             ('A'..'Z' | 'a..z')+
> > > > > > > > > >>       ;
> > > > > > > > > >>
> > > > > > > > > >> but in parser how can I extract the structure name
> > > > > (WeekJobHour)
> > > > > > > > > >> and the structure parameter (Monday) form
> STRUCT_PARAMETER
> > > > > > > > > >> token ?
> > > > > > > > > >>
> > > > > > > > > >> I think a similar issue is present in C/C++ structure
> > > construct
> > > > > > > > > >>
> > > > > > > > > >> Thank you for your suggestions about
> > > > > > > > > >> Silverio Diquigiovanni
> > > > > > > > > > Make a class wich implements TokenStream wich uses the
> > > Lexer.
> > > > > > > > > > In the nextToken method, if the lexer returns a token of
> > > type
> > > > > > > > > > STRUCT_PARAM, split the token in 3 tokens where the
> first
> > > > > would be
> > > > > > > > > > of type STRUCT_NAME the second STRUCT_AT and the third
> > > > > STRUCT_DAY
> > > > > > > > > > and the text of the tokens WeekJobHour, @, Monday
> > > respectively.
> > > > > > > > > > return the first token from the method and store the
> > > other 2.
> > > > > > > > > > In the next 2 calls of nextToken return the stored ones.
> > > > > > > > > >
> > > > > > > > > > Pass the implementor of TokenStream instead of your
> > > Lexer to the
> > > > > > > > > > parser.
> > > > > > > > > >
> > > > > > > > > > Anakreon
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I agree with the above approach, and also read my
> ParserFilter
> > > > > > > paper on
> > > > > > > > > my website,
> http://www.codetransform.com/filterexample.html
> > > > > > > > >
> > > > > > > > > I would recommend an alternative approach, which would
> be to
> > > > > not skip
> > > > > > > > > whitespace in the lexer.  Instead, discard it in the
> parser
> > > > > filter.
> > > > > > > > > That filter can still check that no whitespace occurs
> before
> > > > > or after
> > > > > > > > > an @ between IDENTS.
> > > > > > > > >
> > > > > > > > > Alternately you could keep track of state in the
> lexer.  Set a
> > > > > boolean
> > > > > > > > > variable in the makeToken() method if the token made
> was WS.
> > > > > To see
> > > > > > > > > what is coming after, inspect LA(1).  Assuming @ is
> not used
> > > > > in any
> > > > > > > > > other way, you would have a rule similar to this, where
> > > > > > > > > previousWasWhitespace is the variable set in makeToken().
> > > > > > > > >
> > > > > > > > > AT: { !previousWasWhitespace && (LA(1)==' ' ||
> LA(1)=='\t') }?
> > > > > '@' ;
> > > > > > > > >
> > > > > > > > > Monty
> > > > > > > > >
> > > > > > > > > ANTLR & Java Consultant -- http://www.codetransform.com
> > > > > > > > > ANSI C/GCC transformation toolkit -- 
> > > > > > > > > http://www.codetransform.com/gcc.html
> > > > > > > > > Embrace the Decay --
> > > > > http://www.codetransform.com/EmbraceDecay.html
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yahoo! Groups Links
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Yahoo! Groups Links
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Yahoo! Groups Links
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Yahoo! Groups Links
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
>



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list