[antlr-interest] How to write this lexer rule?

Gavin Lambert antlr at mirality.co.nz
Tue Jan 13 03:11:47 PST 2009


At 22:10 13/01/2009, chain one wrote:
>I want to recognize a function definition and skip it before 
>passing tokens to the parser.
>The function definition starts with "FUNCTION" ,ends with 
>"END_FUNCTION".
[...]
>FUNCTION_DECL
>: 'FUNCTION'
>{
>                        $channel=HIDDEN;
>          }
>          ( options {greedy=false;} : . )*  FUNCTION_DECL ( 
> options {greedy=false;} : . )*  'END_FUNCTION' SEMI
>;

You might need to be more explicit about it:

FUNCTION_DECL
   : 'FUNCTION' { $channel = HIDDEN; }
     (FUNCTION_DECL | ~'E' | 'E' ~'N' | 'EN' ~'D' | 'END' ~'_' |
      'END_' ~'F' | 'END_F' ~'U' | 'END_FU' ~'N' | 'END_FUN' ~'C' 
|
      'END_FUNC' ~'T' | 'END_FUNCT' ~'I' | 'END_FUNCTI' ~'O' |
      'END_FUNCTIO' ~'N' | 'END_FUNCTION' ~SEMI)*
     'END_FUNCTION' SEMI
   ;

(This assumes that whitespace isn't permitted between END_FUNCTION 
and the semicolon.)

Also, if you're wanting to skip over large chunks of your input, 
then you might want to investigate filtering lexers.

>This also could not work : ( :
>
>fragment
>FUNCTION:
>'FUNCTION'
>;
[...]
>FUNCTION_DECL
>:FUNCTION
>{
>                        SKIP();
>          }
>          ( ~(FUNCTION|END_FUNCTION)
>          |
>          FUNCTION_DECL
>          )*  END_FUNCTION SEMI
>;

The reason why that doesn't work is that ~ can only take the 
inverse of sets, and sets in a lexer rule are alternatives of 
individual characters.  FUNCTION and END_FUNCTION are not sets, 
they're sequences, so it's illegal to use ~ on them.



More information about the antlr-interest mailing list