[antlr-interest] Problems with Pre-processing instructions of C#
Jim Idle
jimi at temporal-wave.com
Sun Sep 20 17:03:06 PDT 2009
Yeah. But the # character is overloaded in the lexer and so you need
to keepbstate for that and then you must keep on/off state for pass
through, which you must implement as a stack. You need gated
predicates for directives etc. It took me good few days of wrangling
to get this correct. Best approach is to get #if working, them add
#else and so on.
Jim
On Sep 20, 2009, at 4:54 PM, David-Sarah Hopwood <david-sarah at jacaranda.org
> wrote:
> Eduard Ralph wrote:
>> Hi community,
>>
>> I'm fighting with the processing of pre-processing instructions
>> according to C# specs. The BNF is:
>>
>> Whitespace(opt) '#' Whitespace(opt) 'error' input-characters
>> Whitespace(opt) '#' Whitespace(opt) 'warning' input-characters
>> Whitespace(opt) '#' Whitespace(opt) 'line' ...
>>
>> where
>> Whitespace(opt) can be optionally one or more spaces
>> ('\u0020','\u00A0', and a few more)
>> Input-characters is anything except newline ('\n', and a few more)
>>
>> I wrote in the Lexer, where the other rules are fragments
>>
>>
>> PP_DIAGNOSTIC : (WHITESPACE* HASH WHITESPACE* 'error')
>> =>WHITESPACE* HASH WHITESPACE* ERROR INPUT_CHARACTER*
>> | (WHITESPACE* HASH WHITESPACE*
>> 'warning')=>WHITESPACE* HASH WHITESPACE* WARNING INPUT_CHARACTER*
>> ;
>
> These probably need NEWLINEs at the end.
>
>> PP_LINE : (WHITESPACE* HASH WHITESPACE*
>> 'line')=> WHITESPACE* HASH WHITESPACE* LINE PP_LINE_INDICATOR NEWLINE
>> ;
>
> This will not skip whitespace between LINE and PP_LINE_INDICATOR or
> between PP_LINE_INDICATOR and NEWLINE.
>
> I think you probably want
> ... => WHITESPACE* HASH WHITESPACE* LINE WHITESPACE*
> PP_LINE_INDICATOR
> WHITESPACE* NEWLINE
>
> but that is likely independent of your problem with the lexer not
> recognising which rule applies.
>
>> fragment PP_LINE_INDICATOR : INTEGER_LITERAL
>> PP_FILE_NAME?
>> |
>> IDENTIFIER_OR_KEYWORD
>> ;
>>
>> fragment PP_FILE_NAME : STRING_LITERAL
>> ;
>>
>> fragment HASH : '#';
>
> I would suggest left-factoring and using actions to change the token
> type:
>
> fragment PP_DIAGNOSTIC : ;
> fragment PP_LINE : ;
>
> PP_UNRECOGNIZED
> : WHITESPACE* HASH WHITESPACE*
> ( (ERROR | WARNING)=> INPUT_CHARACTER* { $type = PP_DIAGNOSTIC; }
> | (LINE)=> LINE WHITESPACE* PP_LINE_INDICATOR WHITESPACE*
> { $type = PP_LINE; }
> | INPUT_CHARACTER* // leave as type PP_UNRECOGNIZED [1]
> )? NEWLINE
> ;
>
>
> [1] omit this line if you want an unrecognized instruction to be a
> lexer
> mismatch, but I would suggest leaving it for better error recovery.
>
> --
> David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list