[antlr-interest] How to write this lexer rule?
Gavin Lambert
antlr at mirality.co.nz
Tue Jan 13 03:11:47 PST 2009
At 22:10 13/01/2009, chain one wrote:
>I want to recognize a function definition and skip it before
>passing tokens to the parser.
>The function definition starts with "FUNCTION" ,ends with
>"END_FUNCTION".
[...]
>FUNCTION_DECL
>: 'FUNCTION'
>{
> $channel=HIDDEN;
> }
> ( options {greedy=false;} : . )* FUNCTION_DECL (
> options {greedy=false;} : . )* 'END_FUNCTION' SEMI
>;
You might need to be more explicit about it:
FUNCTION_DECL
: 'FUNCTION' { $channel = HIDDEN; }
(FUNCTION_DECL | ~'E' | 'E' ~'N' | 'EN' ~'D' | 'END' ~'_' |
'END_' ~'F' | 'END_F' ~'U' | 'END_FU' ~'N' | 'END_FUN' ~'C'
|
'END_FUNC' ~'T' | 'END_FUNCT' ~'I' | 'END_FUNCTI' ~'O' |
'END_FUNCTIO' ~'N' | 'END_FUNCTION' ~SEMI)*
'END_FUNCTION' SEMI
;
(This assumes that whitespace isn't permitted between END_FUNCTION
and the semicolon.)
Also, if you're wanting to skip over large chunks of your input,
then you might want to investigate filtering lexers.
>This also could not work : ( :
>
>fragment
>FUNCTION:
>'FUNCTION'
>;
[...]
>FUNCTION_DECL
>:FUNCTION
>{
> SKIP();
> }
> ( ~(FUNCTION|END_FUNCTION)
> |
> FUNCTION_DECL
> )* END_FUNCTION SEMI
>;
The reason why that doesn't work is that ~ can only take the
inverse of sets, and sets in a lexer rule are alternatives of
individual characters. FUNCTION and END_FUNCTION are not sets,
they're sequences, so it's illegal to use ~ on them.
More information about the antlr-interest
mailing list