[antlr-interest] Ignore Whitespace

Tue Nov 5 03:31:43 PST 2002

Hello,

	I'm very sorry to be a bit hick here but - if I set it to ignore
the : character then it starts a new token.  I don't want it to do that,
I want the token to be returned with the colon character just missing.
For example

Assay:                            ,

	To be returned as (underscores are just there to mark out start
and end of token):

_Assay                             _

	However - if I try to ignore the : then I get :

_Assay_
_                             _

	Once again thanks for your help!

Cheers,

Neil Benn
Senior Automation Informatics Scientist

Cambridge Antibody Technology
The Science Park
Melbourn
Cambridgeshire
SG8 6JJ, UK

Telephone: + 44 (0) 1763 263233
Facsimile + 44 (0) 1763 263413
Email: mailto:neil.benn at cambridgeantibody.com
http://www.cambridgeantibody.com

Cambridge Antibody Technology Limited *
Registered Office: The Science Park, Melbourn, Cambridgeshire, SG8 6JJ,
UK
Registered in England and Wales number 2451177
(* Cambridge Antibody Technology Limited is a member of the Cambridge
Antibody Technology Group of Companies)

Confidentiality Note: This information and any attachments is
confidential
and only for use by the individual or entity to whom it has been sent.
Any
unauthorised dissemination, distribution or copying of this message is
strictly prohibited. If you are not the intended recipient please inform
the
sender immediately by reply e-mail and delete this message from your
system.
Thank you for your co-operation.

-----Original Message-----
From: Anakreon Mejdi [mailto:amejdi at ertonline.gr] 
Sent: 05 November 2002 11:20
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] Ignore Whitespace

If both whitespaces and ':' should be ignored then:
class CSVLexer extends Lexer;

  options{filter=IGNORE;}

preotected IGNORE
:
'\t
| ' '
| '\n' {newline();}
| '\r' '\n' {newline();}
| ':'
;

No need to manually set Token type to SKIP or anything else. The Parser
will never know that whitespaces existed or tabs or ...

Neil Benn wrote:
> Hello,
> 
>         I'm sorry to post another newbie question but I stumped!  I'm
> looking at the example to ignore whitespace.  The text I'm trying to 
> tokenise is:-
> 
>   Assay:                                                       , std 
> Alphascreen 384                                          , 
> Description:                                                 , 
>   Software:                                                    ,
Fusion 
> 3.50                                                  , Instrument 
> Serial:                                           , ---------
>   Sample Map:                                                  , 
> demo                                                         , 
> Description:                                                 , 
>   Detection Mode:                                              , 
> Alpha                                                        , 
> Shaking:                                                     ,
Disabled
>   Plate Type:                                                  ,
Packard 
> OptiPlate 384                                        , Temperature 
> Control:                                         , Off
> 
>     If I tokenize this on comma and newline then I will get the tokens

> I
> wish.  However this will also include the whitespace trailing each 
> comment.  I can get rid of this by calling a trim in the parser but
I'm 
> trying to learn how to do this in the lexer.  I looked at the ignore 
> whitespace section in the docs but it doesn't seem to ignore the 
> trailing whitespace.  The code is something like :-
> 
> -----------------------------------------------------------
>  
> class CSVLexer extends Lexer;
>  
> options{filter=IGNORE;}
> 
> DISCARD: ( '\t'
>          | ','
>          | '\n' {newline();}
>          | '\r' '\n' {newline();}
>          )+
>          {$setType(Token.SKIP);}
> ;
> 
> KEEP
> options { ignore=WS; }
> : ( '\u0020' .. '\u002B'
>     | '\u002D' .. '\u0039'
>     | '\u003B' .. '\u00FF')+
> ;
> 
> protected
> 
> IGNORE: (':');
> WS: (' ' | '\t');
> 
> ------------------------------------------------
> 
>     The code compiles OK but the trailing whitespace dosn't get
> removed.  Is this issue something I'm best dealing with in the parser
or 
> is there a way I can deal with it in the lexer?
> 
>  
>     Thanks, in advance for your insistence.
>  
> Cheers,
> 
> Neil Benn
> Senior Automation Informatics Scientist
> 
> Cambridge Antibody Technology
> The Science Park
> Melbourn
> Cambridgeshire
> SG8 6JJ, UK
> 
> Telephone: + 44 (0) 1763 263233
> Facsimile + 44 (0) 1763 263413
> Email: mailto:neil.benn at cambridgeantibody.com
> http://www.cambridgeantibody.com
> 
> Cambridge Antibody Technology Limited *
> Registered Office: The Science Park, Melbourn, Cambridgeshire, SG8 
> 6JJ, UK Registered in England and Wales number 2451177
> (* Cambridge Antibody Technology Limited is a member of the Cambridge 
> Antibody Technology Group of Companies)
> 
> Confidentiality Note: This information and any attachments is 
> confidential and only for use by the individual or entity to whom it 
> has been sent. Any unauthorised dissemination, distribution or copying

> of this message is strictly prohibited. If you are not the intended 
> recipient please inform the sender immediately by reply e-mail and 
> delete this message from your system. Thank you for your co-operation.
> 
> 
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
> <http://docs.yahoo.com/info/terms/>.

Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/ 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/