[antlr-interest] Ignore Whitespace

Neil Benn neil.benn at cambridgeantibody.com
Tue Nov 5 02:55:49 PST 2002


Hello,

        I'm sorry to post another newbie question but I stumped!  I'm
looking at the example to ignore whitespace.  The text I'm trying to
tokenise is:-

  Assay:                                                       , std
Alphascreen 384                                          , Description:
, 
  Software:                                                    , Fusion
3.50                                                  , Instrument
Serial:                                           , ---------
  Sample Map:                                                  , demo
, Description:                                                 , 
  Detection Mode:                                              , Alpha
, Shaking:                                                     ,
Disabled
  Plate Type:                                                  , Packard
OptiPlate 384                                        , Temperature
Control:                                         , Off


    If I tokenize this on comma and newline then I will get the tokens I
wish.  However this will also include the whitespace trailing each
comment.  I can get rid of this by calling a trim in the parser but I'm
trying to learn how to do this in the lexer.  I looked at the ignore
whitespace section in the docs but it doesn't seem to ignore the
trailing whitespace.  The code is something like :-

-----------------------------------------------------------
 
class CSVLexer extends Lexer;
 
options{filter=IGNORE;}

DISCARD: ( '\t'
         | ','
         | '\n' {newline();}
         | '\r' '\n' {newline();}
         )+
         {$setType(Token.SKIP);}
;

KEEP 
options { ignore=WS; }
: ( '\u0020' .. '\u002B' 
    | '\u002D' .. '\u0039'
    | '\u003B' .. '\u00FF')+
; 

protected

IGNORE: (':');
WS: (' ' | '\t');

------------------------------------------------

    The code compiles OK but the trailing whitespace dosn't get removed.
Is this issue something I'm best dealing with in the parser or is there
a way I can deal with it in the lexer?

 
    Thanks, in advance for your insistence.
 
Cheers,

Neil Benn
Senior Automation Informatics Scientist

Cambridge Antibody Technology
The Science Park
Melbourn
Cambridgeshire
SG8 6JJ, UK

Telephone: + 44 (0) 1763 263233
Facsimile + 44 (0) 1763 263413
Email:  <mailto:neil.benn at cambridgeantibody.com>
mailto:neil.benn at cambridgeantibody.com
 <http://www.cambridgeantibody.com> http://www.cambridgeantibody.com

Cambridge Antibody Technology Limited *
Registered Office: The Science Park, Melbourn, Cambridgeshire, SG8 6JJ,
UK
Registered in England and Wales number 2451177
(* Cambridge Antibody Technology Limited is a member of the Cambridge
Antibody Technology Group of Companies)

Confidentiality Note: This information and any attachments is
confidential
and only for use by the individual or entity to whom it has been sent.
Any
unauthorised dissemination, distribution or copying of this message is
strictly prohibited. If you are not the intended recipient please inform
the
sender immediately by reply e-mail and delete this message from your
system.
Thank you for your co-operation.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20021105/6423a920/attachment.html


More information about the antlr-interest mailing list