[antlr-interest] C# lexer and unicode

Rodrigo B. de Oliveira rbo at acm.org
Sat Jan 31 04:36:56 PST 2004


They work ok for me (for latin characters such as çãéõü) but the
input files must be UTF8 encoded.

Best wishes,
Rodrigo

----- Original Message ----- 
From: "maaxxxcal" <maaxxxcal at yahoo.com>
To: <antlr-interest at yahoogroups.com>
Sent: Saturday, January 31, 2004 2:17 AM
Subject: [antlr-interest] C# lexer and unicode


I would like to know if ANTLR's C# parser generator supports unicode.
I have an input that contains some chinese/japanese identifiers and
they are not being lexed properly. They are simply being skipped from
the stream. They don't even show up in the lexer's nextToken() method.

I wonder if this is because there is something wrong in my lexer or
just because it's not yet fully supported.

I have:

  charVocabulary = '\u0000'..'\ufffe';

Here's my whitespace rule:

// Whitespace -- ignored
WS      : ( options { generateAmbigWarnings = false; }
  : ' ' // blank
  | '\t' // tab
  | "\r\n"    {newline();} // Windows
  | ('\r'|'\n') {newline();} // Unix or Mac
  | '\f'      // form feed
  | ('\0'..'\10'|'\16'..'\37')  // control characters
  ) {$setType(Token.SKIP);}
;

Here's my rule for identifiers:

IDENT
options {testLiterals=true;
         paraphrase="an identifier";}
: ('\u0080'..'\ufffe'|'a'..'z'|'_')
('\u0080'..'\ufffe'|'a'..'z'|'_'|'$'|'0'..'9')*
;

And here's the string I'm trying to parse:

»ù½ð´úÂë VARCHAR(6) NOT NULL





Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/




 

Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list