[antlr-interest] Problems with Pre-processing instructions of C#

David-Sarah Hopwood david-sarah at jacaranda.org
Sun Sep 20 16:54:25 PDT 2009


Eduard Ralph wrote:
> Hi community,
> 
> I'm fighting with the processing of pre-processing instructions according to C# specs. The BNF is:
> 
> Whitespace(opt) '#' Whitespace(opt) 'error' input-characters
> Whitespace(opt) '#' Whitespace(opt) 'warning' input-characters
> Whitespace(opt) '#' Whitespace(opt) 'line'  ...
> 
> where
>  Whitespace(opt) can be optionally one or more spaces ('\u0020','\u00A0', and a few more)
>  Input-characters is anything except newline ('\n', and a few more)
> 
> I wrote in the Lexer, where the other rules are fragments
> 
> 
> PP_DIAGNOSTIC      :        (WHITESPACE* HASH WHITESPACE* 'error')=>WHITESPACE* HASH WHITESPACE* ERROR INPUT_CHARACTER*
>                             |        (WHITESPACE* HASH WHITESPACE* 'warning')=>WHITESPACE* HASH WHITESPACE* WARNING INPUT_CHARACTER*
>                             ;

These probably need NEWLINEs at the end.

> PP_LINE                 :        (WHITESPACE* HASH WHITESPACE* 'line')=> WHITESPACE* HASH WHITESPACE* LINE PP_LINE_INDICATOR NEWLINE
>                             ;

This will not skip whitespace between LINE and PP_LINE_INDICATOR or
between PP_LINE_INDICATOR and NEWLINE.

I think you probably want
  ... => WHITESPACE* HASH WHITESPACE* LINE WHITESPACE* PP_LINE_INDICATOR
           WHITESPACE* NEWLINE

but that is likely independent of your problem with the lexer not
recognising which rule applies.

> fragment PP_LINE_INDICATOR      :        INTEGER_LITERAL PP_FILE_NAME?
>                                                |        IDENTIFIER_OR_KEYWORD
>                                                ;
> 
> fragment PP_FILE_NAME              :        STRING_LITERAL
>                                                ;
> 
> fragment HASH                          :        '#';

I would suggest left-factoring and using actions to change the token type:

  fragment PP_DIAGNOSTIC : ;
  fragment PP_LINE : ;

  PP_UNRECOGNIZED
    : WHITESPACE* HASH WHITESPACE*
      ( (ERROR | WARNING)=> INPUT_CHARACTER* { $type = PP_DIAGNOSTIC; }
      | (LINE)=> LINE WHITESPACE* PP_LINE_INDICATOR WHITESPACE*
                                             { $type = PP_LINE; }
      | INPUT_CHARACTER* // leave as type PP_UNRECOGNIZED [1]
      )? NEWLINE
    ;


[1] omit this line if you want an unrecognized instruction to be a lexer
    mismatch, but I would suggest leaving it for better error recovery.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list