[antlr-interest] Ignore Whitespace

Tue Nov 5 03:27:19 PST 2002

Hi,

On Tue, Nov 05, 2002 at 10:55:49AM -0000, Neil Benn wrote:
>     If I tokenize this on comma and newline then I will get the tokens I
> wish.  However this will also include the whitespace trailing each
> comment.  I can get rid of this by calling a trim in the parser but I'm
> trying to learn how to do this in the lexer.

Just call trim in the lexer before returning the token :)

> class CSVLexer extends Lexer;
>
> options{filter=IGNORE;}
>
> DISCARD: ( '\t'
>          | ','
>          | '\n' {newline();}
>          | '\r' '\n' {newline();}
>          )+
>          {$setType(Token.SKIP);}
> ;
>
> KEEP
> options { ignore=WS; }
> : ( '\u0020' .. '\u002B'
      ^^^^^^^ that's a space not ? Gives also non deterministic warning
here. Reading the generated code makes me suspect it does not do the right
thing here. Making this '\u0021' fixes the warning... did not test it though...

>     | '\u002D' .. '\u0039'
>     | '\u003B' .. '\u00FF')+
> ;
>
> protected
>
> IGNORE: (':');
> WS: (' ' | '\t');

Shouldn't WS be protected ?

>     The code compiles OK but the trailing whitespace dosn't get removed.
> Is this issue something I'm best dealing with in the parser or is there
> a way I can deal with it in the lexer?

The eternal answer applies here: 'it depends'. For what you are doing it
could probably work.

Cheers,

Ric

PS I'm not an expert on the use of the ignore thing. Hardly use it myself.
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- klaren at cs.utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
  Wo das Chaos auf die Ordnung trifft, gewinnt meist das Chaos, weil es
  besser organisiert ist. --- Friedrich Nietzsche

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/