[antlr-interest] C# lexer and unicode
Rodrigo B. de Oliveira
rbo at acm.org
Sat Jan 31 04:36:56 PST 2004
They work ok for me (for latin characters such as çãéõü) but the
input files must be UTF8 encoded.
Best wishes,
Rodrigo
----- Original Message -----
From: "maaxxxcal" <maaxxxcal at yahoo.com>
To: <antlr-interest at yahoogroups.com>
Sent: Saturday, January 31, 2004 2:17 AM
Subject: [antlr-interest] C# lexer and unicode
I would like to know if ANTLR's C# parser generator supports unicode.
I have an input that contains some chinese/japanese identifiers and
they are not being lexed properly. They are simply being skipped from
the stream. They don't even show up in the lexer's nextToken() method.
I wonder if this is because there is something wrong in my lexer or
just because it's not yet fully supported.
I have:
charVocabulary = '\u0000'..'\ufffe';
Here's my whitespace rule:
// Whitespace -- ignored
WS : ( options { generateAmbigWarnings = false; }
: ' ' // blank
| '\t' // tab
| "\r\n" {newline();} // Windows
| ('\r'|'\n') {newline();} // Unix or Mac
| '\f' // form feed
| ('\0'..'\10'|'\16'..'\37') // control characters
) {$setType(Token.SKIP);}
;
Here's my rule for identifiers:
IDENT
options {testLiterals=true;
paraphrase="an identifier";}
: ('\u0080'..'\ufffe'|'a'..'z'|'_')
('\u0080'..'\ufffe'|'a'..'z'|'_'|'$'|'0'..'9')*
;
And here's the string I'm trying to parse:
»ù½ð´úÂë VARCHAR(6) NOT NULL
Yahoo! Groups Links
To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
Yahoo! Groups Links
To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list