[antlr-interest] Ignore Whitespace

Tue Nov 5 03:20:25 PST 2002

If both whitespaces and ':' should be ignored then:
class CSVLexer extends Lexer;

  options{filter=IGNORE;}

preotected IGNORE
:
'\t
| ' '
| '\n' {newline();}
| '\r' '\n' {newline();}
| ':'
;

No need to manually set Token type to SKIP or anything else.
The Parser  will never know that whitespaces existed or tabs or ...

Neil Benn wrote:
> Hello,
> 
>         I'm sorry to post another newbie question but I stumped!  I'm 
> looking at the example to ignore whitespace.  The text I'm trying to 
> tokenise is:-
> 
>   Assay:                                                       , std 
> Alphascreen 384                                          , 
> Description:                                                 , 
>   Software:                                                    , Fusion 
> 3.50                                                  , Instrument 
> Serial:                                           , ---------
>   Sample Map:                                                  , 
> demo                                                         , 
> Description:                                                 , 
>   Detection Mode:                                              , 
> Alpha                                                        , 
> Shaking:                                                     , Disabled
>   Plate Type:                                                  , Packard 
> OptiPlate 384                                        , Temperature 
> Control:                                         , Off
> 
>     If I tokenize this on comma and newline then I will get the tokens I 
> wish.  However this will also include the whitespace trailing each 
> comment.  I can get rid of this by calling a trim in the parser but I'm 
> trying to learn how to do this in the lexer.  I looked at the ignore 
> whitespace section in the docs but it doesn't seem to ignore the 
> trailing whitespace.  The code is something like :-
> 
> -----------------------------------------------------------
>  
> class CSVLexer extends Lexer;
>  
> options{filter=IGNORE;}
> 
> DISCARD: ( '\t'
>          | ','
>          | '\n' {newline();}
>          | '\r' '\n' {newline();}
>          )+
>          {$setType(Token.SKIP);}
> ;
> 
> KEEP
> options { ignore=WS; }
> : ( '\u0020' .. '\u002B'
>     | '\u002D' .. '\u0039'
>     | '\u003B' .. '\u00FF')+
> ;
> 
> protected
> 
> IGNORE: (':');
> WS: (' ' | '\t');
> 
> ------------------------------------------------
> 
>     The code compiles OK but the trailing whitespace dosn't get 
> removed.  Is this issue something I'm best dealing with in the parser or 
> is there a way I can deal with it in the lexer?
> 
>  
>     Thanks, in advance for your insistence.
>  
> Cheers,
> 
> Neil Benn
> Senior Automation Informatics Scientist
> 
> Cambridge Antibody Technology
> The Science Park
> Melbourn
> Cambridgeshire
> SG8 6JJ, UK
> 
> Telephone: + 44 (0) 1763 263233
> Facsimile + 44 (0) 1763 263413
> Email: mailto:neil.benn at cambridgeantibody.com
> http://www.cambridgeantibody.com
> 
> Cambridge Antibody Technology Limited *
> Registered Office: The Science Park, Melbourn, Cambridgeshire, SG8 6JJ, UK
> Registered in England and Wales number 2451177
> (* Cambridge Antibody Technology Limited is a member of the Cambridge
> Antibody Technology Group of Companies)
> 
> Confidentiality Note: This information and any attachments is confidential
> and only for use by the individual or entity to whom it has been sent. Any
> unauthorised dissemination, distribution or copying of this message is
> strictly prohibited. If you are not the intended recipient please inform the
> sender immediately by reply e-mail and delete this message from your system.
> Thank you for your co-operation.
> 
> 
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service 
> <http://docs.yahoo.com/info/terms/>.

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/