[antlr-interest] How to write this lexer rule?

chain one chainone at gmail.com
Tue Jan 13 03:29:42 PST 2009


Hi Gavin Lamber: Thanks for your reply.
I tried the lexer rule you gave me. But following error comes out:

Alternative 155: after matching input such as
'F''U''N''C''T''I''O''N''F''U''N''C''T''I''O''N''F''U''N''C''T''I''O''N''E''N''D''_''F''U''N''C''T''I''O''N'{'0'..'9',
'A'..'Z', '_',
'a'..'z'}'F''U''N''C''T''I''O''N''E''N''D''_''F''U''N''C''T''I''O''N'{'0'..'9',
'A'..'Z', '_', 'a'..'z'}'F''U''N''C''T''I''O'{'\u0000'..'/', ':'..'@', 'N',
'['..'^', '`', '{'..'\uFFFF'} decision cannot predict what comes next due to
recursion overflow to FUNCTION_DECL from FUNCTION_DECL

On Tue, Jan 13, 2009 at 7:11 PM, Gavin Lambert <antlr at mirality.co.nz> wrote:

> At 22:10 13/01/2009, chain one wrote:
>
>> I want to recognize a function definition and skip it before passing
>> tokens to the parser.
>> The function definition starts with "FUNCTION" ,ends with "END_FUNCTION".
>>
> [...]
>
>> FUNCTION_DECL
>> : 'FUNCTION'
>> {
>>                       $channel=HIDDEN;
>>         }
>>         ( options {greedy=false;} : . )*  FUNCTION_DECL ( options
>> {greedy=false;} : . )*  'END_FUNCTION' SEMI
>> ;
>>
>
> You might need to be more explicit about it:
>
> FUNCTION_DECL
>  : 'FUNCTION' { $channel = HIDDEN; }
>    (FUNCTION_DECL | ~'E' | 'E' ~'N' | 'EN' ~'D' | 'END' ~'_' |
>     'END_' ~'F' | 'END_F' ~'U' | 'END_FU' ~'N' | 'END_FUN' ~'C' |
>     'END_FUNC' ~'T' | 'END_FUNCT' ~'I' | 'END_FUNCTI' ~'O' |
>     'END_FUNCTIO' ~'N' | 'END_FUNCTION' ~SEMI)*
>    'END_FUNCTION' SEMI
>  ;
>
> (This assumes that whitespace isn't permitted between END_FUNCTION and the
> semicolon.)
>
> Also, if you're wanting to skip over large chunks of your input, then you
> might want to investigate filtering lexers.
>
>  This also could not work : ( :
>>
>> fragment
>> FUNCTION:
>> 'FUNCTION'
>> ;
>>
> [...]
>
>> FUNCTION_DECL
>> :FUNCTION
>> {
>>                       SKIP();
>>         }
>>         ( ~(FUNCTION|END_FUNCTION)
>>         |
>>         FUNCTION_DECL
>>         )*  END_FUNCTION SEMI
>> ;
>>
>
> The reason why that doesn't work is that ~ can only take the inverse of
> sets, and sets in a lexer rule are alternatives of individual characters.
>  FUNCTION and END_FUNCTION are not sets, they're sequences, so it's illegal
> to use ~ on them.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090113/76f67fa6/attachment.html 


More information about the antlr-interest mailing list