[antlr-interest] Unicode handling

Thu Apr 22 10:10:29 PDT 2004

Mark, 

> This seems much more preferable to me than extending the C++ 
> support with some Unicode library (like IBM's icu or some such).

Hmm, handling of UTF-8 is more expensive when you need to process it (case folding, composition etc.) so this
transformation format is not recommended to be used that way. It is rather a format for transport not processing. In the
case of antlr though it seems to fit well except that defining identifiers might be a bit unintuitive. Debugging the
lexer could also become a pain. But otherwise, I don't see what speaks against your approach.

For antlr, though, it would be great if there were some more generic support. Identifier start and middle chars,
numbers/digits etc. could be predefined instead to have them to declare them over and over again in lexer grammars.

Mike
--
www.soft-gems.net

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/