[antlr-interest] skipping whitespaces in code and avoiding it in comments

Sun Mar 8 13:03:42 PDT 2009

Maciej Gawinecki wrote:
> Hello,
> 
> Lexer can /skip/ or send to /hidden channel/ the tokens that are 
> whitespaces.
> 
> However I would like it not to skip them when the parser recognizes a 
> comment fragment, because I want to buffer comments including their 
> whitespaces.
<SNIP>
> value 	:	DIGIT+ ;
> id 	:	LETTER+ ;
> 	
> comment
> @init{ isComment = true; }
> @after{ isComment = false; }
> 	:	(LETTER|DIGIT)* ;
> 	
> LETTER 	:	'a'..'z'|'A'..'Z' ;
> 
> DIGIT
> 	:	'0'..'9' ;
> 
> WS
>      :   (
>               ' '
>          |    '\r'
>          |    '\t'
>          |    '\n'
>          )
>              {
>              	if (!isComment)               	
>                  	skip();
>              }
>      ;		

It's far more common to make VALUE, ID, and COMMENT token types (and 
comment different to what you have now - from // to newline inclusive is 
more normal). Then you put the comments and the WS on the hidden 
channel. Tokens are usually complete lexical elements - not single 
characters. Otherwise, the parser may as well be working on the input 
stream rather than a token stream.

-- 

Sam Barnett-Cormack