[antlr-interest] skipping whitespaces in code and avoiding it in comments
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Sun Mar 8 13:03:42 PDT 2009
Maciej Gawinecki wrote:
> Hello,
>
> Lexer can /skip/ or send to /hidden channel/ the tokens that are
> whitespaces.
>
> However I would like it not to skip them when the parser recognizes a
> comment fragment, because I want to buffer comments including their
> whitespaces.
<SNIP>
> value : DIGIT+ ;
> id : LETTER+ ;
>
> comment
> @init{ isComment = true; }
> @after{ isComment = false; }
> : (LETTER|DIGIT)* ;
>
> LETTER : 'a'..'z'|'A'..'Z' ;
>
> DIGIT
> : '0'..'9' ;
>
> WS
> : (
> ' '
> | '\r'
> | '\t'
> | '\n'
> )
> {
> if (!isComment)
> skip();
> }
> ;
It's far more common to make VALUE, ID, and COMMENT token types (and
comment different to what you have now - from // to newline inclusive is
more normal). Then you put the comments and the WS on the hidden
channel. Tokens are usually complete lexical elements - not single
characters. Otherwise, the parser may as well be working on the input
stream rather than a token stream.
--
Sam Barnett-Cormack
More information about the antlr-interest
mailing list