[antlr-interest] Problems with Pre-processing instructions of C#

Jim Idle jimi at temporal-wave.com
Sun Sep 20 17:03:06 PDT 2009


Yeah. But the # character is overloaded in the lexer and so you need  
to keepbstate for that and then you must keep on/off state for pass  
through, which you must implement as a stack. You need gated  
predicates for directives etc. It took me good few days of wrangling  
to get this correct. Best approach is to get #if working, them add  
#else and so on.

Jim

On Sep 20, 2009, at 4:54 PM, David-Sarah Hopwood <david-sarah at jacaranda.org 
 > wrote:

> Eduard Ralph wrote:
>> Hi community,
>>
>> I'm fighting with the processing of pre-processing instructions  
>> according to C# specs. The BNF is:
>>
>> Whitespace(opt) '#' Whitespace(opt) 'error' input-characters
>> Whitespace(opt) '#' Whitespace(opt) 'warning' input-characters
>> Whitespace(opt) '#' Whitespace(opt) 'line'  ...
>>
>> where
>> Whitespace(opt) can be optionally one or more spaces  
>> ('\u0020','\u00A0', and a few more)
>> Input-characters is anything except newline ('\n', and a few more)
>>
>> I wrote in the Lexer, where the other rules are fragments
>>
>>
>> PP_DIAGNOSTIC      :        (WHITESPACE* HASH WHITESPACE* 'error') 
>> =>WHITESPACE* HASH WHITESPACE* ERROR INPUT_CHARACTER*
>>                            |        (WHITESPACE* HASH WHITESPACE*  
>> 'warning')=>WHITESPACE* HASH WHITESPACE* WARNING INPUT_CHARACTER*
>>                            ;
>
> These probably need NEWLINEs at the end.
>
>> PP_LINE                 :        (WHITESPACE* HASH WHITESPACE*  
>> 'line')=> WHITESPACE* HASH WHITESPACE* LINE PP_LINE_INDICATOR NEWLINE
>>                            ;
>
> This will not skip whitespace between LINE and PP_LINE_INDICATOR or
> between PP_LINE_INDICATOR and NEWLINE.
>
> I think you probably want
>  ... => WHITESPACE* HASH WHITESPACE* LINE WHITESPACE*  
> PP_LINE_INDICATOR
>           WHITESPACE* NEWLINE
>
> but that is likely independent of your problem with the lexer not
> recognising which rule applies.
>
>> fragment PP_LINE_INDICATOR      :        INTEGER_LITERAL  
>> PP_FILE_NAME?
>>                                               |         
>> IDENTIFIER_OR_KEYWORD
>>                                               ;
>>
>> fragment PP_FILE_NAME              :        STRING_LITERAL
>>                                               ;
>>
>> fragment HASH                          :        '#';
>
> I would suggest left-factoring and using actions to change the token  
> type:
>
>  fragment PP_DIAGNOSTIC : ;
>  fragment PP_LINE : ;
>
>  PP_UNRECOGNIZED
>    : WHITESPACE* HASH WHITESPACE*
>      ( (ERROR | WARNING)=> INPUT_CHARACTER* { $type = PP_DIAGNOSTIC; }
>      | (LINE)=> LINE WHITESPACE* PP_LINE_INDICATOR WHITESPACE*
>                                             { $type = PP_LINE; }
>      | INPUT_CHARACTER* // leave as type PP_UNRECOGNIZED [1]
>      )? NEWLINE
>    ;
>
>
> [1] omit this line if you want an unrecognized instruction to be a  
> lexer
>    mismatch, but I would suggest leaving it for better error recovery.
>
> -- 
> David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list