[antlr-interest] Reading block of arbitrary text delimited by curly braces

Mike Lischke mike at lischke-online.de
Wed Jul 18 00:39:45 PDT 2012

> I missed a quote in the previous message, our single token block reader was this:
> BLOCK : 'BLOCK' (' '|'\t'|'\r'|'\n')* '{' (~'}')*  '}' ;

You didn't say why your original lexer rule is not ideal. I'd suggest however a slightly changed variant:

BLOCK: 'BLOCK' '{' (options { greedy = false; }: .)* '}';

There's no need to explicitly catch whitespaces between the BLOCK keyword and the opening curly brace if you declare a whitespace rule like this:

WS: (' ' | '\t' | '\n' | '\r')+ { $channel=TokenChannels.Hidden; };

Additionally, the implementation for .* is clever enough to exclude the token(s) following the .* expression (here the closing curly brace). You just have to make this matching non-greedy otherwise the scanner will try to match anything (including closing curly braces) until the last one in the input.

Btw. this is a very typical lexer rule to collect C multi-line comments and similar constructs.


More information about the antlr-interest mailing list