[antlr-interest] XML Character Classes
Alan Gutierrez
alan-antlr-interest at engrm.com
Fri Jan 21 09:06:28 PST 2005
My first ANTLR grammar.
Attempting to create a regular expression language to match SAX
event streams. Starting by trying to define character classes as
specified in the XML Spec.
I'm getting these error messages:
lexical nondeterminism between rules LETTER and BASE_CHAR upon
k==1:'A'..'Z','a'..'z','\u00c0'..'\u00d6' ... and so on
lexical nondeterminism between rules LETTER and IDEOGRAPHIC upon
I found this article. Is there anything else I can read on this
subject? Did I really mess something up?
http://www.jguru.com/faq/view.jsp?EID=64316
My grammar looks like:
class XPatternParser extends Parser;
options {
k = 2;
}
pattern : letter:LETTER
{
System.out.println("<" + letter.getText() + ">");
}
;
class XPatternLexer extends Lexer;
LETTER
: (BASE_CHAR | IDEOGRAPHIC)
;
BASE_CHAR
:
( '\u0041'..'\u005A' | '\u0061'..'\u007A' | '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6' | '\u00F8'..'\u00FF' | '\u0100'..'\u0131'
| '\u0134'..'\u013E' | '\u0141'..'\u0148' | '\u014A'..'\u017E'
/* snip */
| '\u30A1'..'\u30FA' | '\u3105'..'\u312C' | '\uAC00'..'\uD7A3'
)
;
IDEOGRAPHIC
:
( '\u4E00'..'\u9FA5' | '\u3007' | '\u3021'..'\u3029'
)
;
--
Alan Gutierrez - alan at engrm.com
More information about the antlr-interest
mailing list