[antlr-interest] XML Character Classes

Alan Gutierrez alan-antlr-interest at engrm.com
Fri Jan 21 09:06:28 PST 2005


    My first ANTLR grammar.

    Attempting to create a regular expression language to match SAX
    event streams. Starting by trying to define character classes as
    specified in the XML Spec.

    I'm getting these error messages:

    lexical nondeterminism between rules LETTER and BASE_CHAR upon
        k==1:'A'..'Z','a'..'z','\u00c0'..'\u00d6' ... and so on

    lexical nondeterminism between rules LETTER and IDEOGRAPHIC upon

    I found this article. Is there anything else I can read on this
        subject? Did I really mess something up?

    http://www.jguru.com/faq/view.jsp?EID=64316

    My grammar looks like:

    class XPatternParser extends Parser;
    options {
        k = 2;
    }

    pattern :  letter:LETTER
            {
                System.out.println("<" + letter.getText() + ">");
            }
            ;

    class XPatternLexer extends Lexer;

    LETTER
        : (BASE_CHAR | IDEOGRAPHIC)
        ;

    BASE_CHAR
        : 
        ( '\u0041'..'\u005A' | '\u0061'..'\u007A' | '\u00C0'..'\u00D6' 
        | '\u00D8'..'\u00F6' | '\u00F8'..'\u00FF' | '\u0100'..'\u0131'
        | '\u0134'..'\u013E' | '\u0141'..'\u0148' | '\u014A'..'\u017E'

            /* snip */

        | '\u30A1'..'\u30FA' | '\u3105'..'\u312C' | '\uAC00'..'\uD7A3'
        )
        ;

    IDEOGRAPHIC
      :
      ( '\u4E00'..'\u9FA5' | '\u3007' | '\u3021'..'\u3029'
      )
      ;

--
Alan Gutierrez - alan at engrm.com


More information about the antlr-interest mailing list