[antlr-interest] Reading block of arbitrary text delimited by curly braces
Mike Lischke
mike at lischke-online.de
Wed Jul 18 00:39:45 PDT 2012
> I missed a quote in the previous message, our single token block reader was this:
>
> BLOCK : 'BLOCK' (' '|'\t'|'\r'|'\n')* '{' (~'}')* '}' ;
>
You didn't say why your original lexer rule is not ideal. I'd suggest however a slightly changed variant:
BLOCK: 'BLOCK' '{' (options { greedy = false; }: .)* '}';
There's no need to explicitly catch whitespaces between the BLOCK keyword and the opening curly brace if you declare a whitespace rule like this:
WS: (' ' | '\t' | '\n' | '\r')+ { $channel=TokenChannels.Hidden; };
Additionally, the implementation for .* is clever enough to exclude the token(s) following the .* expression (here the closing curly brace). You just have to make this matching non-greedy otherwise the scanner will try to match anything (including closing curly braces) until the last one in the input.
Btw. this is a very typical lexer rule to collect C multi-line comments and similar constructs.
Mike
--
www.soft-gems.net
More information about the antlr-interest
mailing list