[antlr-interest] C# parser grammar problem
Terence Parr
parrt at cs.usfca.edu
Tue Mar 6 10:44:50 PST 2007
Hi. That line in the code indicates a malformed \uxxxx cha ref. Do
you see one in your code?
Ter
On Mar 6, 2007, at 9:33 AM, Johannes Luber wrote:
> Hello,
>
> I've converted all the rules in chapter 9 of the Ecma334-PDF, so I
> wanted to check, if I wrote the grammar correctly so far. The grammar
> check is successful, but still I can't generate the corresponding java
> files. The console spits the following out exception out:
>
> java.lang.StringIndexOutOfBoundsException: String index out of
> range: 7
> at java.lang.String.substring(Unknown Source)
> at
> org.antlr.tool.Grammar.getUnescapedStringFromGrammarStringLiteral
> (Grammar.java:1432)
> at org.antlr.tool.ANTLRLexer.mCHAR_LITERAL(ANTLRLexer.java:957)
> at org.antlr.tool.ANTLRLexer.nextToken(ANTLRLexer.java:215)
> at
> antlr.TokenStreamRewriteEngine.nextToken
> (TokenStreamRewriteEngine.java:161)
> at antlr.TokenBuffer.fill(TokenBuffer.java:69)
> at antlr.TokenBuffer.LA(TokenBuffer.java:80)
> at antlr.LLkParser.LA(LLkParser.java:52)
> at org.antlr.tool.ANTLRParser.ruleScopeSpec(ANTLRParser.java:1509)
> at org.antlr.tool.ANTLRParser.rule(ANTLRParser.java:1310)
> at org.antlr.tool.ANTLRParser.rules(ANTLRParser.java:702)
> at org.antlr.tool.ANTLRParser.grammar(ANTLRParser.java:392)
> at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:507)
> at org.antlr.tool.Grammar.setGrammarContent(Grammar.java:484)
> at org.antlr.works.grammar.EngineGrammar.createNewGrammar(Unknown
> Source)
> at org.antlr.works.grammar.EngineGrammar.createCombinedGrammar
> (Unknown
> Source)
> at org.antlr.works.grammar.EngineGrammar.createGrammars(Unknown
> Source)
> at org.antlr.works.grammar.EngineGrammar.getParserGrammar(Unknown
> Source)
> at org.antlr.works.generate.CodeGenerate.getGrammarLanguage
> (Unknown Source)
> at org.antlr.works.menu.MenuGenerate.isKnownLanguage(Unknown Source)
> at org.antlr.works.menu.MenuGenerate.checkLanguage(Unknown Source)
> at
> org.antlr.works.menu.MenuGenerate.generateCodeProcessContinued(Unknown
> Source)
> at org.antlr.works.menu.MenuGenerate.checkGrammarDidEnd(Unknown
> Source)
> at org.antlr.works.grammar.CheckGrammar.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
>
> I have no idea, where my mistake could lie. I hope that someone can
> shed
> some light onto this. The grammar is attached to the email.
>
> Thanks in advance,
> Johannes Luber
> /* By Johannes Luber, 2007. All rights reserved.
>
> Converted original grammar in Ecma 334 into ANTLR syntax, removed
> left recursion and
> collapsed rules like A:B?, B: C+ into A: C*.
>
> TBD: Convert rules containing only token references like 'd' or 'a'
> |'b' in lexer rules (ALL_UPPER_CASE)
>
> */
>
> grammar CSharp3;
>
> // Grammar Ambiguities described in §9.2.3 in Ecma 334
>
> // Intrinsic Datatypes: object, string, bool, char, decimal, sbyte,
> short,
> // int, long, byte, ushort, unit, ulong, float, double
>
> options {
> language=CSharp;
> output=template;
> //namespace = "CSharpML.CSharpParser";
> }
>
> @header {
>
> }
>
> input
> : input_section*
> ;
>
>
> input_section
> : input_element* NEW_LINE
> | pp_directive
> ;
>
>
> input_element
> : whitespace
> | comment
> | token
> ;
>
> whitespace
> : WHITESPACE_CHARACTER*
> ;
>
> fragment WHITESPACE_CHARACTER
> : UNICODE_CLASS_Zs
> | '\u0009' // Horizontal tab character
> | '\u000B' // Vertical tab character
> | '\u000C' // Form feed character
> ;
>
> NEW_LINE
> : '\u000D' // Carriage return character
> | '\u000A' // Line feed character
> | '\u000D\u000A' // Carriage return character followed by line
> feed character
> | '\u2085' // Next line character
> | '\u2028' // Line separator character
> | '\u2029' // Paragraph separator character
> ;
>
> comment
> : single_line_comment
> | delimited_comment
> ;
>
> single_line_comment
> : '//' INPUT_CHARACTER*
> ;
>
>
> fragment INPUT_CHARACTER
> : ~NEW_LINE_CHARACTER // Any Unicode character except a
> new_line_character
> ;
>
> NEW_LINE_CHARACTER
> : '\u000D' // Carriage return character
> | '\u000A' // Line feed character
> | '\u0085' // Next line character
> | '\u2028' // Line separator character
> | '\u2029' // Paragraph separator character
> ;
>
> delimited_comment
> : '/*' DELIMITED_COMMENT_SECTION* ASTERISKS '/'
> ;
>
> fragment DELIMITED_COMMENT_SECTION
> : NOT_ASTERISK
> | ASTERISKS NOT_SLASH
> ;
>
> fragment ASTERISKS
> : ('*') ('*')*
> ;
>
> fragment NOT_ASTERISK
> : ~'*' // Any Unicode character except *
> ;
>
> fragment NOT_SLASH
> : ~'/' // Any Unicode character except /
> ;
>
> fragment UNICODE_CLASS_Zs // Any character with Unicode class Zs
> (18 characters known)
> : '\u0020' // SPACE
> | '\u00A0' // NO_BREAK SPACE
> | '\u1680' // OGHAM SPACE MARK
> | '\u180E' // MONGOLIAN VOWEL SEPARATOR
> | '\u2000' // EN QUAD
> | '\u2001' // EM QUAD
> | '\u2002' // EN SPACE
> | '\u2003' // EM SPACE
> | '\u2004' // THREE_PER_EM SPACE
> | '\u2005' // FOUR_PER_EM SPACE
> | '\u2006' // SIX_PER_EM SPACE
> | '\u2008' // PUNCTUATION SPACE
> | '\u2009' // THIN SPACE
> | '\u200A' // HAIR SPACE
> | '\u202F' // NARROW NO_BREAK SPACE
> | '\u3000' // IDEOGRAPHIC SPACE
> | '\u205F' // MEDIUM MATHEMATICAL SPACE
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassLu.g.
> fragment UNICODE_CLASS_Lu
> : '\u0041'..'\u005A' // LATIN CAPITAL LETTER A_Z
> | '\u00C0'..'\u00DE' // ACCENTED CAPITAL LETTERS
> ;
>
> // TBD: Inclusion of all lowercase letter characters. Replace this
> rule with the one in UnicodeClassLl.g.
> fragment UNICODE_CLASS_Ll
> : '\u0061'..'\u007A' // LATIN SMALL LETTER a_z
> ;
>
> // TBD: Inclusion of all titlecase letter characters. Replace this
> rule with the one in UnicodeClassLt.g.
> fragment UNICODE_CLASS_Lt
> : '\u01C5' // LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
> | '\u01C8' // LATIN CAPITAL LETTER L WITH SMALL LETTER J
> | '\u01CB' // LATIN CAPITAL LETTER N WITH SMALL LETTER J
> | '\u01F2' // LATIN CAPITAL LETTER D WITH SMALL LETTER Z
> ;
>
> // TBD: Inclusion of all modifier letter characters. Replace this
> rule with the one in UnicodeClassLm.g.
> fragment UNICODE_CLASS_Lm
> : '\u02B0'..'\u02EE' // MODIFIER LETTERS
> ;
>
> // TBD: Inclusion of all other letter characters. Replace this rule
> with the one in UnicodeClassLo.g.
> fragment UNICODE_CLASS_Lo
> : '\u01BB' // LATIN LETTER TWO WITH STROKE
> | '\u01C0' // LATIN LETTER DENTAL CLICK
> | '\u01C1' // LATIN LETTER LATERAL CLICK
> | '\u01C2' // LATIN LETTER ALVEOLAR CLICK
> | '\u01C3' // LATIN LETTER RETROFLEX CLICK
> | '\u0294' // LATIN LETTER GLOTTAL STOP
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassNl.g.
> fragment UNICODE_CLASS_Nl
> : '\u16EE' // RUNIC ARLAUG SYMBOL
> | '\u16EF' // RUNIC TVIMADUR SYMBOL
> | '\u16F0' // RUNIC BELGTHOR SYMBOL
> | '\u2160' // ROMAN NUMERAL ONE
> | '\u2161' // ROMAN NUMERAL TWO
> | '\u2162' // ROMAN NUMERAL THREE
> | '\u2163' // ROMAN NUMERAL FOUR
> | '\u2164' // ROMAN NUMERAL FIVE
> | '\u2165' // ROMAN NUMERAL SIX
> | '\u2166' // ROMAN NUMERAL SEVEN
> | '\u2167' // ROMAN NUMERAL EIGHT
> | '\u2168' // ROMAN NUMERAL NINE
> | '\u2169' // ROMAN NUMERAL TEN
> | '\u216A' // ROMAN NUMERAL ELEVEN
> | '\u216B' // ROMAN NUMERAL TWELVE
> | '\u216C' // ROMAN NUMERAL FIFTY
> | '\u216D' // ROMAN NUMERAL ONE HUNDRED
> | '\u216E' // ROMAN NUMERAL FIVE HUNDRED
> | '\u216F' // ROMAN NUMERAL ONE THOUSAND
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassMn.g.
> fragment UNICODE_CLASS_Mn
> : '\u0300' // COMBINING GRAVE ACCENT
> | '\u0301' // COMBINING ACUTE ACCENT
> | '\u0302' // COMBINING CIRCUMFLEX ACCENT
> | '\u0303' // COMBINING TILDE
> | '\u0304' // COMBINING MACRON
> | '\u0305' // COMBINING OVERLINE
> | '\u0306' // COMBINING BREVE
> | '\u0307' // COMBINING DOT ABOVE
> | '\u0308' // COMBINING DIAERESIS
> | '\u0309' // COMBINING HOOK ABOVE
> | '\u030A' // COMBINING RING ABOVE
> | '\u030B' // COMBINING DOUBLE ACUTE ACCENT
> | '\u030C' // COMBINING CARON
> | '\u030D' // COMBINING VERTICAL LINE ABOVE
> | '\u030E' // COMBINING DOUBLE VERTICAL LINE ABOVE
> | '\u030F' // COMBINING DOUBLE GRAVE ACCENT
> | '\u0310' // COMBINING CANDRABINDU
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassMc.g.
> fragment UNICODE_CLASS_Mc
> : '\u0903' // DEVANAGARI SIGN VISARGA
> | '\u093E' // DEVANAGARI VOWEL SIGN AA
> | '\u093F' // DEVANAGARI VOWEL SIGN I
> | '\u0940' // DEVANAGARI VOWEL SIGN II
> | '\u0949' // DEVANAGARI VOWEL SIGN CANDRA O
> | '\u094A' // DEVANAGARI VOWEL SIGN SHORT O
> | '\u094B' // DEVANAGARI VOWEL SIGN O
> | '\u094C' // DEVANAGARI VOWEL SIGN AU
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassCf.g.
> fragment UNICODE_CLASS_Cf
> : '\u00AD' // SOFT HYPHEN
> | '\u0600' // ARABIC NUMBER SIGN
> | '\u0601' // ARABIC SIGN SANAH
> | '\u0602' // ARABIC FOOTNOTE MARKER
> | '\u0603' // ARABIC SIGN SAFHA
> | '\u06DD' // ARABIC END OF AYAH
> ;
>
> // This definition contains all known characters
> fragment UNICODE_CLASS_Pc
> : '\u005F' // LOW LINE
> | '\u203F' // UNDERTIE
> | '\u2040' // CHARACTER TIE
> | '\u2054' // INVERTED UNDERTIE
> | '\uFE33' // PRESENTATION FORM FOR VERTICAL LOW LINE
> | '\uFE34' // PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
> | '\uFE4D' // DASHED LOW LINE
> | '\uFE4E' // CENTRELINE LOW LINE
> | '\uFE4F' // WAVY LOW LINE
> | '\uFF3F' // FULLWIDTH LOW LINE
> ;
>
> // TBD: Inclusion of all uppercase letter characters. Replace this
> rule with the one in UnicodeClassNd.g.
> fragment UNICODE_CLASS_Nd
> : '\u0030' // DIGIT ZERO
> | '\u0031' // DIGIT ONE
> | '\u0032' // DIGIT TWO
> | '\u0033' // DIGIT THREE
> | '\u0034' // DIGIT FOUR
> | '\u0035' // DIGIT FIVE
> | '\u0036' // DIGIT SIX
> | '\u0037' // DIGIT SEVEN
> | '\u0038' // DIGIT EIGHT
> | '\u0039' // DIGIT NINE
> ;
>
> token
> : identifier
> | KEYWORD[true] // Use all keywords
> | integer_literal
> | real_literal
> | character_literal
> | string_literal
> | OPREATER_OR_PUNCTUATOR
> ;
>
> identifier
> : available_identifier
> | '@' identifier_or_keyword[true]
> ;
>
> fragment available_identifier
> : identifier_or_keyword[false] // An identifier_or_keyword that is
> not a keyword
> ;
>
> // The booleean allowKeywords determines, if identifier_or_keyword
> may actually include keywords in the current context.
> fragment identifier_or_keyword[bool allowKeywords]
> : identifier_start_character identifier_part_character*
> ;
>
> fragment identifier_start_character
> : letter_character
> | '_' // (the underscore character U+005F)
> ;
>
> fragment identifier_part_character
> : letter_character
> | decimal_digit_character
> | connecting_character
> | combining_character
> | formatting_character
> ;
>
> fragment letter_character
> : UNICODE_CLASS_Lu // A Unicode character of classes Lu, Ll, Lt,
> Lm, Lo, or Nl
> | UNICODE_CLASS_Ll
> | UNICODE_CLASS_Lt
> | UNICODE_CLASS_Lm
> | UNICODE_CLASS_Lo
> | UNICODE_CLASS_Nl
> | unicode_escape_sequence["LAndNl"] // An encoded character of
> classes Lu, Ll, Lt, Lm, Lo, or Nl
> ;
>
> fragment combining_character
> : UNICODE_CLASS_Mn // A Unicode character of classes Mn or Mc
> | UNICODE_CLASS_Mc
> | unicode_escape_sequence["MnAndMc"] // An encoded character of
> classes Mn or Mc
> ;
>
> fragment decimal_digit_character
> : UNICODE_CLASS_Nd // A Unicode character of the class Nd
> | unicode_escape_sequence["Nd"] // An encoded character of classes Nd
> ;
>
> fragment connecting_character
> : UNICODE_CLASS_Pc // A Unicode character of the class Pc
> | unicode_escape_sequence["Pc"] // An encoded character of classes Pc
> ;
>
> fragment formatting_character
> : UNICODE_CLASS_Cf // A Unicode character of the class Cf
> | unicode_escape_sequence["Cf"] // An encoded character of classes Cf
> ;
>
> // Allowed unicodeClasses values are "LandNl", "MnAndMc", "Nd",
> "Pc", "Cf" and "SingleCharacter"
> // The classes restrict the possible unicode values according the
> Unicode standard.
> // "SingleCharacter" allows every value between U+0000 and U+FFFF
> inclusive.
> // Detect if '\' is followed by a character not of this group: ',
> ", \, 0, a, b, f, n, r, t, u, U, x, v
> fragment unicode_escape_sequence[string unicodeClasses]
> : '\u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
> | '\U' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
> HEX_DIGIT HEX_DIGIT
> ;
>
> // This boolean allows the exclusion of the keywords 'true' and
> 'false'
> KEYWORD[bool useBooleanKeywords]
> : 'abstract'
> | 'as'
> | 'base'
> | 'bool'
> | 'break'
> | 'byte'
> | 'case'
> | 'catch'
> | 'char'
> | 'checked'
> | 'class'
> | 'const'
> | 'continue'
> | 'decimal'
> | 'default'
> | 'delegate'
> | 'do'
> | 'double'
> | 'else'
> | 'enum'
> | 'event'
> | 'explicit'
> | 'extern'
> | 'false'
> | 'finally'
> | 'fixed'
> | 'float'
> | 'for'
> | 'foreach'
> | 'goto'
> | 'if'
> | 'implicit'
> | 'in'
> | 'int'
> | 'interface'
> | 'internal'
> | 'is'
> | 'lock'
> | 'long'
> | 'namespace'
> | 'new'
> | 'null'
> | 'object'
> | 'operator'
> | 'out'
> | 'override'
> | 'params'
> | 'private'
> | 'protected'
> | 'public'
> | 'readonly'
> | 'ref'
> | 'return'
> | 'sbyte'
> | 'sealed'
> | 'short'
> | 'sizeof'
> | 'stackalloc'
> | 'static'
> | 'string'
> | 'struct'
> | 'switch'
> | 'this'
> | 'throw'
> | 'true'
> | 'try'
> | 'typeof'
> | 'uint'
> | 'ulong'
> | 'unchecked'
> | 'unsafe'
> | 'ushort'
> | 'using'
> | 'virtual'
> | 'void'
> | 'volatile'
> | 'while'
> ;
>
> BOOLEAN_LITERAL
> : 'true'
> | 'false'
> ;
>
> integer_literal
> : decimal_integer_literal
> | hexadecimal_integer_literal
> ;
>
> fragment decimal_integer_literal
> : DECIMAL_DIGIT+ INTEGER_TYPE_SUFFIX?
> ;
>
> fragment DECIMAL_DIGIT
> : '0'..'9'
> ;
>
> fragment INTEGER_TYPE_SUFFIX
> : 'U'
> | 'u'
> | 'L'
> | 'l'
> | 'UL'
> | 'Ul'
> | 'uL'
> | 'ul'
> | 'LU'
> | 'Lu'
> | 'lU'
> | 'lu'
> ;
>
> fragment hexadecimal_integer_literal
> : '0x' HEX_DIGIT+ INTEGER_TYPE_SUFFIX?
> | '0X' HEX_DIGIT+ INTEGER_TYPE_SUFFIX?
> ;
>
> fragment HEX_DIGIT
> : '0'..'9'
> | 'A'..'F'
> | 'a'..'f'
> ;
>
> real_literal
> : DECIMAL_DIGIT+ '.' DECIMAL_DIGIT+ exponent_part? REAL_TYPE_SUFFIX?
> | '.' DECIMAL_DIGIT+ exponent_part? REAL_TYPE_SUFFIX?
> | DECIMAL_DIGIT+ exponent_part REAL_TYPE_SUFFIX?
> | DECIMAL_DIGIT+ REAL_TYPE_SUFFIX
> ;
>
> fragment exponent_part
> : 'e' SIGN? DECIMAL_DIGIT+
> | 'E' SIGN? DECIMAL_DIGIT+
> ;
>
> fragment SIGN
> : '+'
> | '-'
> ;
>
> fragment REAL_TYPE_SUFFIX
> : 'F'
> | 'f'
> | 'D'
> | 'd'
> | 'M'
> | 'm'
> ;
>
> character_literal
> : ''' character '''
> ;
>
> fragment character
> : SINGLE_CHARACTER
> | SIMPLE_ESCAPE_SEQUENCE
> | hexadecimal_escape_sequence
> | unicode_escape_sequence
> ;
>
> fragment SINGLE_CHARACTER
> : ~(''' | '\' | NEW_LINE_CHARACTER )
> ;
>
> // Detect if '\' is followed by a character not of this group: ',
> ", \, 0, a, b, f, n, r, t, u, U, x, v
> fragment SIMPLE_ESCAPE_SEQUENCE
> : '\''
> | '\"'
> | '\\'
> | '\0'
> | '\a'
> | '\b'
> | '\f'
> | '\n'
> | '\r'
> | '\t'
> | '\v'
> ;
>
> // Detect if '\' is followed by a character not of this group: ',
> ", \, 0, a, b, f, n, r, t, u, U, x, v
> fragment hexadecimal_escape_sequence
> : '\x' HEX_DIGIT HEX_DIGIT? HEX_DIGIT? HEX_DIGIT?
> ;
>
> string_literal
> : regular_string_literal
> | verbatim_string_literal
> ;
>
> regular_string_literal
> : '"' regular_string_literal_character* '"'
> ;
>
> fragment regular_string_literal_character
> : SINGLE_REGULAR_STRING_LITERAL_CHARACTER
> | SIMPLE_ESCAPE_SEQUENCE
> | hexadecimal_escape_sequence
> | unicode_escape_sequence
> ;
>
> fragment SINGLE_REGULAR_STRING_LITERAL_CHARACTER
> : ~( '"' | '\' | NEW_LINE_CHARACTER )
> ;
>
> verbatim_string_literal
> : '@"' verbatim_string_literal_character* '"'
> ;
>
> fragment verbatim_string_literal_character
> : SINGLE_VERBATIM_STRING_LITERAL_CHARACTER
> | QUTOE_ESCAPE_SEQUENCE
> ;
>
> fragment SINGLE_VERBATIM_STRING_LITERAL_CHARACTER
> : ~'"'
> ;
>
> fragment QUTOE_ESCAPE_SEQUENCE
> : '""'
> ;
>
> NULL_LITERAL
> : 'null'
> ;
>
> OPREATER_OR_PUNCTUATOR
> : '{'
> | '}'
> | '['
> | ']'
> | '('
> | ')'
> | '.'
> | ','
> | ':'
> | ';'
> | '+'
> | '-'
> | '*'
> | '/'
> | '%'
> | '&'
> | '|'
> | '^'
> | '!'
> | '~'
> | '='
> | '<'
> | '>'
> | '?'
> | '??'
> | '::'
> | '++'
> | '--'
> | '&&'
> | '||'
> | '->'
> | '=='
> | '!='
> | '<='
> | '>='
> | '+='
> | '-='
> | '*='
> | '/='
> | '%='
> | '&='
> | '|='
> | '^='
> | '<<'
> | '<<='
> ;
>
> fragment right_shift
> : '>' '>'
> ;
>
> fragment right_shift_assignment
> : '>' '>='
> ;
>
> // The compiler has to tell, if some preprocessor directives are
> missing or out of order (regions and conditionals)
> pp_directive
> : pp_declaration
> | pp_conditional
> | pp_line
> | pp_diagnostic
> | pp_region
> | pp_pragma
> ;
>
> conditional_symbol
> : identifier
> | KEYWORD[false] // Any keyword except 'true' or 'false'
> ;
>
> pp_expression
> : whitespace? pp_or_expression whitespace?
> ;
>
> pp_or_expression
> : pp_and_expression
> : pp_or_expression whitespace? '||' whitespace? pp_and_expression
> ;
>
> pp_and_expression
> : (pp_equality_expression) (whitespace? '&&' whitespace?
> pp_equality_expression)*
> ;
>
> pp_equality_expression
> : (pp_unary_expression) (whitespace? '==' whitespace?
> pp_unary_expression | whitespace? '!=' whitespace?
> pp_unary_expression)*
> ;
>
> pp_unary_expression
> : pp_primary_expression
> | '!' whitespace? pp_unary_expression
> ;
>
> pp_primary_expression
> : 'true'
> | 'false'
> | conditional_symbol
> | '(' whitespace? pp_expression whitespace? ')'
> ;
>
> /*
> The processing of a #define directive causes the given conditional
> compilation symbol to become defined,
> starting with the source line that follows the directive. Likewise,
> the processing of a #undef directive
> causes the given conditional compilation symbol to become
> undefined, starting with the source line that
> follows the directive.
>
> Any #define and #undef directives in a source file shall occur
> before the first token (§9.4) in the source
> file; otherwise a compile-time error occurs. In intuitive terms,
> #define and #undef directives shall
> precede any “real code” in the source file.
> */
> pp_declaration
> : whitespace? '#' whitespace? 'define' whitespace
> conditional_symbol pp_new_line
> | whitespace? '#' whitespace? 'undef' whitespace
> conditional_symbol pp_new_line
> ;
>
> pp_new_line
> : whitespace? single_line_comment? NEW_LINE
> ;
>
> /*
> A pp-conditional selects at most one of the contained conditional-
> sections for normal lexical processing:
>
> - The pp-expressions of the #if and #elif directives are evaluated
> in order until one yields true. If an
> expression yields true, the conditional-section of the
> corresponding directive is selected.
> - If all pp-expressions yield false, and if a #else directive is
> present, the conditional-section of the
> #else directive is selected.
> - Otherwise, no conditional-section is selected.
>
> The selected conditional-section, if any, is processed as a normal
> input-section: the source code contained in
> the section shall adhere to the lexical grammar; tokens are
> generated from the source code in the section; and
> pre-processing directives in the section have the prescribed effects.
>
> The remaining conditional-sections, if any, are processed as
> skipped-sections: except for pre-processing
> directives, the source code in the section need not adhere to the
> lexical grammar; no tokens are generated
> from the source code in the section; and pre-processing directives
> in the section shall be lexically correct but
> are not otherwise processed. Within a conditional-section that is
> being processed as a skipped-section, any
> nested conditional-sections (contained in nested #if...#endif and
> #region...#endregion constructs) are
> also processed as skipped-sections.
> */
> pp_conditional
> : pp_if_section pp_elif_section* pp_else_section? pp_endif
> ;
>
> pp_if_section
> : whitespace? '#' whitespace? 'if' whitespace pp_expression
> pp_new_line conditional_section?
> ;
>
> pp_elif_section
> : whitespace? '#' whitespace? 'elif' whitespace pp_expression
> pp_new_line conditional_section?
> ;
>
> pp_else_section
> : whitespace? '#' whitespace? 'else' pp_new_line conditional_section?
> ;
>
> pp_endif
> : whitespace? '#' whitespace? 'endif' pp_new_line
> ;
>
> conditional_section
> : input_section
> | skipped_section+
> ;
>
>
> skipped_section
> : whitespace? skipped_characters? NEW_LINE
> | pp_directive
> ;
>
> skipped_characters
> : NOT_NUMBER_SIGN INPUT_CHARACTER*
> ;
>
> NOT_NUMBER_SIGN
> : ~'#' // Any input_character except #
> ;
>
> pp_diagnostic
> : whitespace? '#' whitespace? 'error' pp_message
> | whitespace? '#' whitespace? 'warning' pp_message
> ;
>
> pp_message
> : NEW_LINE
> | whitespace INPUT_CHARACTER* NEW_LINE
> ;
>
> /*
> No semantic meaning is attached to a region; regions are intended
> for use by the programmer or by
> automated tools to mark a section of source code. The message
> specified in a #region or #endregion
> directive likewise has no semantic meaning; it merely serves to
> identify the region. Matching
> #region and #endregion directives can have different pp-messages.
>
> The lexical processing of a region:
>
> #region
> ...
> #endregion
>
> corresponds exactly to the lexical processing of a conditional
> compilation directive of the form:
>
> #if true
> ...
> #endif
> */
> pp_region
> : pp_start_region conditional_section? pp_end_region
> ;
>
> pp_start_region
> : whitespace? '#' whitespace? 'region' pp_message
> ;
>
> pp_end_region
> : whitespace? '#' whitespace? 'endregion' pp_message
> ;
>
> /*
> When no #line directives are present, the compiler reports true
> line numbers and source file names in its
> output. When processing a #line directive that includes a line-
> indicator that is not identifier-or-keyword,
> the compiler treats the line after the directive as having the
> given line number (and file name, if specified).
>
> A #line directive in which the line-indicator is an identifier-or-
> keyword whose value equals default
> (using equality as specified in §9.4.2) reverses the effect of all
> preceding #line directives. The compiler
> reports true line information for subsequent lines, precisely as if
> no #line directives had been processed.
>
> The purpose of a line-indicator with an identifier-or-keyword whose
> value does not equal default is
> implementation-defined. An implementation that does not recognize
> such an identifier-or-keyword in a line-
> indicator shall issue a warning.
> */
> pp_line
> : whitespace? '#' whitespace? 'line' whitespace line_indicator
> pp_new_line
> ;
>
> line_indicator
> : DECIMAL_DIGIT+ whitespace file_name
> | DECIMAL_DIGIT+
> | identifier_or_keyword
> ;
>
> file_name
> : '"' FILE_NAME_CHARACTER+ '"'
> ;
>
> FILE_NAME_CHARACTER
> : ~( '"' | NEW_LINE_CHARACTER ) // Any character except " (U
> +0022), and new_line_character
> ;
>
> pp_pragma
> : whitespace? '#' whitespace? 'pragma' pp_pragma_text
> ;
>
> pp_pragma_text
> : NEW_LINE
> | whitespace INPUT_CHARACTER* NEW_LINE
> ;
More information about the antlr-interest
mailing list