[antlr-interest] ANTLR3 and C# 2.0 Lexer troubles

Igor Trofimov iamhere2 at gmail.com
Sun Oct 8 10:43:13 PDT 2006


Hi, All!

I'm newbie in syntax analysis / lexers / parsers / etc.
Nevertheless, i try build C# lexer/parser with ANTLR. ;)

For a start, i describe _lexer_ grammar from C# language specification in
ANTLR3 grammar syntax, including Unciode classes definitions.
I remove some left-recursion, simplify some definitions using java.g as
template.
ANTLRWorks reports no invalid rules itself (without Cgrammar Check command).

And now, i have some troubles/questions with this LEXER grammar. I hope, you
help me with it.

1. There are some rules in specification in informal form:
    <single-character> ::= Any character except ' (U+0027), \ (U+005C), and
<new-line-character>

    I try define such rules, using '~' syntax:

    SINGLE_CHARACTER :  ~ (NOT_SINGLE_CHARACTER);
    NOT_SINGLE_CHARACTER : '\u0027' | '\u005c' | NEW_LINE_CHARACTER;

    But it dont work properly :(
    Fortunately, expanded version seems to be worked:
     SINGLE_CHARACTER :  ~( '\u000D' | '\u000A' | '\u0085' | '\u2028' |
'\u2029');

    Why the first variant not works? Is it invalid grammar syntax or ANTLR
bug?

2.  There are some rules in specification, which requires some additional
logic to be difined, e.g:

     DECIMAL_DIGIT_CHARACTER
              : UNICODE_CATEGORY_DECIMALDIGITNUMBER  // A Unicode character
of the class Nd
              | UNICODE_CHARACTER_ESCAPE_SEQUENCE    // representing a
character of the class Nd -- ??? How to check this ???

     CONDITIONAL_SYMBOL
               : IDENTIFIER_OR_KEYWORD        // Any identifier-or-keyword
except true or false ??? How to check this ???

3. C# target seems unfinished? It miss some evident "override" keywords, and
some DFA definitions :(

4. And, the last, but most important. My grammar dont works absolutely :(
    And there are no errors reported in grammar, but in ANTLR itself.
   ANTLR tool prints the message:
=====================================================
ANTLR Parser Generator   Early Access Version 3.0b4 (??, 2006)  1989-2006
internal error: org.antlr.tool.Grammar.getCharValueFromGrammarCharLiteral(
Grammar.java:1519): invalid char literal: ''
internal error: org.antlr.tool.Grammar.getCharValueFromGrammarCharLiteral(
Grammar.java:1519): invalid char literal: ''

internal error: CSharp.g : java.lang.NullPointerException
org.antlr.analysis.NFAToDFAConverter.convertToAcceptState(
NFAToDFAConverter.java:989)
org.antlr.analysis.NFAToDFAConverter.addDFAStateToWorkList(
NFAToDFAConverter.java:953)
org.antlr.analysis.NFAToDFAConverter.findNewDFAStatesAndAddDFATransitions(
NFAToDFAConverter.java:291)
org.antlr.analysis.NFAToDFAConverter.convert(NFAToDFAConverter.java:101)
org.antlr.analysis.DFA.<init>(DFA.java:214)
org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:763)
org.antlr.tool.Grammar.createLookaheadDFAs(Grammar.java:711)
org.antlr.codegen.Target.performGrammarAnalysis(Target.java:111)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:284)
org.antlr.Tool.processGrammar(Tool.java:320)
org.antlr.Tool.process(Tool.java:251)
org.antlr.Tool.main(Tool.java:70)

=====================================================

I attach my grammar to this post. May be, i have some terrible fundamental
errors in grammar?
Please, give me the direction to further progress...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061008/a9dca816/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grammar.zip
Type: application/zip
Size: 12305 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20061008/a9dca816/attachment-0001.zip 


More information about the antlr-interest mailing list