[antlr-interest] Re: Antlr 3.0 spaces between tokens

lgcraymer lgc at mail1.jpl.nasa.gov
Thu Nov 11 01:19:02 PST 2004



Context-dependent lexing is a nasty problem.  ANTLR 3 probably won't
solve it.  I ran into exactly the same problem in an expression
grammar for spacecraft sequencing.  The cleanest approach I could come
up with was to have a counter that was incremented by LBRACKET and
decremented by RBRACKET.  If the counter was zero, then whitespace
tokens were marked "SKIP"; if it was positive, then they were "WS" and
recognized by the parser.  That helped simplify the grammar.

--Loring


--- In antlr-interest at yahoogroups.com, "matthew ford"
<Matthew.Ford at f...> wrote:
> That is what I am talking about
> whitespace as a syntax feature and not just a token separator.
> This is usually only in a small number of rules
> One example I had was a math language where whitespace was significant
> inside
> [ ] when indexing matrices but elsewhere it was just a token separator.
> 
> matthew
> 
> ----- Original Message ----- 
> From: "lgcraymer" <lgc at m...>
> To: <antlr-interest at yahoogroups.com>
> Sent: Thursday, November 11, 2004 6:13 PM
> Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> 
> 
> >
> >
> > As usual--you ignore whitespace during parsing.  Then when you need
> > the whitespace around a token, you peek into the token stream around
> > the point of interest.  It doesn't help if whitespace is really a
> > syntax feature and not just a token separator.
> >
> > --Loring
> >
> >
> > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > <Matthew.Ford at f...> wrote:
> > > A bit too clever for me  how do you write the parser rules?
> > > matthew
> > >
> > > ----- Original Message ----- 
> > > From: "lgcraymer" <lgc at m...>
> > > To: <antlr-interest at yahoogroups.com>
> > > Sent: Thursday, November 11, 2004 5:51 PM
> > > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> > >
> > >
> > > >
> > > >
> > > > The min/max of ASTMinMax gives you an index into the token stream.
> > > > Look for neighboring whitespace tokens.  By carrying the token
stream
> > > > index around, you carry around references to associated
whitespace.
> > > > It's a rather clever trick for solving the whitespace tracking
> > problem.
> > > >
> > > > --Loring
> > > >
> > > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > > <Matthew.Ford at f...> wrote:
> > > > > Perhaps I am missing the point of the that article, but in
my case I
> > > > don't
> > > > > what to just keep the whitespace for printing.
> > > > >
> > > > > For some (not all) parser rules,  whitespace is actually
important
> > > > for the
> > > > > parsing.
> > > > > So I want the parser to see all the whitespace for some
rules and
> > > > not others
> > > > >
> > > > > So what I want is the Token.SKIP option on the parser side
instead
> > > > of on the
> > > > > lexer side and controlled on a rule basis.
> > > > >
> > > > > matthew
> > > > >
> > > > > ----- Original Message ----- 
> > > > > From: "lgcraymer" <lgc at m...>
> > > > > To: <antlr-interest at yahoogroups.com>
> > > > > Sent: Thursday, November 11, 2004 5:32 PM
> > > > > Subject: [antlr-interest] Re: Antlr 3.0 spaces between tokens
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > Take a look at
> > > > > >
> > > > >
> > > >
> > >
> >
>
<http://www.antlr.org/article/preserving.token.order/preserving.token.order.
> > > > > tml>
> > > > > >
> > > > > > It's hard to see how ANTLR 3 could do better.
> > > > > >
> > > > > > --Loring
> > > > > >
> > > > > > --- In antlr-interest at yahoogroups.com, "matthew ford"
> > > > > > <Matthew.Ford at f...> wrote:
> > > > > > > Hi Ter,
> > > > > > >
> > > > > > > Perhaps for Antlr 3.0 we can have a better means of handling
> > white
> > > > > > space.
> > > > > > >
> > > > > > > Antlr provides an ignore whitespace capability that is
appealing
> > > > > > > WS : ( ' ' | '\t' | '\n' { newline(); } | '\r' )+
> > > > > > >      { $setType(Token.SKIP); }
> > > > > > >    ;but every time I try and use it I come across a
> > situation where
> > > > > > I really
> > > > > > > want/need the white space in the parser.
> > > > > > >
> > > > > > > So I end up having the lexer pass it back to the parser.
> > > > > > > (or have switch in the lexer that the parser uses to
control the
> > > > > > return of
> > > > > > > whitespace.  I know this is a no-no but it has worked for me
> > in some
> > > > > > cases)
> > > > > > >
> > > > > > > The parser usually only needs to know about the whitespace
> > in a few
> > > > > > rules
> > > > > > > but now has
> > > > > > > (WS)* all over the place to handle whitespace every where.
> > > > > > >
> > > > > > > Basically what I would like to have
> > > > > > >  the lexer pass all the whitespace back to the parser)
and then
> > > > in the
> > > > > > > parser be able to say
> > > > > > > a) for this rule ignore white space.
> > > > > > > or
> > > > > > > b) for this rule whitespace is important
> > > > > > >
> > > > > > > Actually the second option is more likely.
> > > > > > >
> > > > > > > matthew
> > > > > > >
> > > > > > > ----- Original Message ----- 
> > > > > > > From: "Monty Zukowski" <monty at c...>
> > > > > > > To: <antlr-interest at yahoogroups.com>
> > > > > > > Sent: Thursday, November 11, 2004 3:38 AM
> > > > > > > Subject: Re: [antlr-interest] spaces between tokens
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > On Nov 10, 2004, at 7:39 AM, Anakreon wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > silverio.di at q... wrote:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Hi,
> > > > > > > > >> I've a big problem.
> > > > > > > > >>
> > > > > > > > >> In my grammar, how in many others, the whitespaces are
> > > > skipped in
> > > > > > > > >> lexer,
> > > > > > > > >> but I've some circumstances in which I need to
check that
> > > > not any
> > > > > > > > >> spaces
> > > > > > > > >> are present between tokens.
> > > > > > > > >>
> > > > > > > > >> Example :
> > > > > > > > >> WeekJobHour at Monday = 8
> > > > > > > > >>
> > > > > > > > >> would mean assign 8 (hours) to parameter Monday of
> > structure
> > > > > > > > >> WeekJobHour.
> > > > > > > > >> I would like my lexer extract following tokens:
> > > > > > > > >>
> > > > > > > > >> IDENT ATSIGN IDENT
> > > > > > > > >>
> > > > > > > > >> but my problem is to check than not any WS are present
> > between
> > > > > > > > >> IDENT and ATSIGN and between ATSIGN and IDENT so
> > > > > > > > >>
> > > > > > > > >> WeekJobHour at Monday = 8        // is OK
> > > > > > > > >> WeekJobHour @Monday = 8       // is BAD
> > > > > > > > >> WeekJobHour@ Monday = 8       // is BAD
> > > > > > > > >> WeekJobHour  @ Monday = 8           // is BAD too !
> > > > > > > > >>
> > > > > > > > >> I could use following lexer rule:
> > > > > > > > >>
> > > > > > > > >> STRUCT_PARAMETER
> > > > > > > > >>       :     ('A'..'Z' | 'a..z')+
> > > > > > > > >>             '@'
> > > > > > > > >>             ('A'..'Z' | 'a..z')+
> > > > > > > > >>       ;
> > > > > > > > >>
> > > > > > > > >> but in parser how can I extract the structure name
> > > > (WeekJobHour)
> > > > > > > > >> and the structure parameter (Monday) form
STRUCT_PARAMETER
> > > > > > > > >> token ?
> > > > > > > > >>
> > > > > > > > >> I think a similar issue is present in C/C++ structure
> > construct
> > > > > > > > >>
> > > > > > > > >> Thank you for your suggestions about
> > > > > > > > >> Silverio Diquigiovanni
> > > > > > > > > Make a class wich implements TokenStream wich uses the
> > Lexer.
> > > > > > > > > In the nextToken method, if the lexer returns a token of
> > type
> > > > > > > > > STRUCT_PARAM, split the token in 3 tokens where the
first
> > > > would be
> > > > > > > > > of type STRUCT_NAME the second STRUCT_AT and the third
> > > > STRUCT_DAY
> > > > > > > > > and the text of the tokens WeekJobHour, @, Monday
> > respectively.
> > > > > > > > > return the first token from the method and store the
> > other 2.
> > > > > > > > > In the next 2 calls of nextToken return the stored ones.
> > > > > > > > >
> > > > > > > > > Pass the implementor of TokenStream instead of your
> > Lexer to the
> > > > > > > > > parser.
> > > > > > > > >
> > > > > > > > > Anakreon
> > > > > > > > >
> > > > > > > >
> > > > > > > > I agree with the above approach, and also read my
ParserFilter
> > > > > > paper on
> > > > > > > > my website,
http://www.codetransform.com/filterexample.html
> > > > > > > >
> > > > > > > > I would recommend an alternative approach, which would
be to
> > > > not skip
> > > > > > > > whitespace in the lexer.  Instead, discard it in the
parser
> > > > filter.
> > > > > > > > That filter can still check that no whitespace occurs
before
> > > > or after
> > > > > > > > an @ between IDENTS.
> > > > > > > >
> > > > > > > > Alternately you could keep track of state in the
lexer.  Set a
> > > > boolean
> > > > > > > > variable in the makeToken() method if the token made
was WS.
> > > > To see
> > > > > > > > what is coming after, inspect LA(1).  Assuming @ is
not used
> > > > in any
> > > > > > > > other way, you would have a rule similar to this, where
> > > > > > > > previousWasWhitespace is the variable set in makeToken().
> > > > > > > >
> > > > > > > > AT: { !previousWasWhitespace && (LA(1)==' ' ||
LA(1)=='\t') }?
> > > > '@' ;
> > > > > > > >
> > > > > > > > Monty
> > > > > > > >
> > > > > > > > ANTLR & Java Consultant -- http://www.codetransform.com
> > > > > > > > ANSI C/GCC transformation toolkit -- 
> > > > > > > > http://www.codetransform.com/gcc.html
> > > > > > > > Embrace the Decay --
> > > > http://www.codetransform.com/EmbraceDecay.html
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Yahoo! Groups Links
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Yahoo! Groups Links
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Yahoo! Groups Links
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
> >
> >
> > Yahoo! Groups Links
> >
> >
> >
> >
> >
> >
> >





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list