[antlr-interest] Unicode handling
Mike Lischke
lists at lischke-online.de
Thu Apr 22 10:10:29 PDT 2004
Mark,
> This seems much more preferable to me than extending the C++
> support with some Unicode library (like IBM's icu or some such).
Hmm, handling of UTF-8 is more expensive when you need to process it (case folding, composition etc.) so this
transformation format is not recommended to be used that way. It is rather a format for transport not processing. In the
case of antlr though it seems to fit well except that defining identifiers might be a bit unintuitive. Debugging the
lexer could also become a pain. But otherwise, I don't see what speaks against your approach.
For antlr, though, it would be great if there were some more generic support. Identifier start and middle chars,
numbers/digits etc. could be predefined instead to have them to declare them over and over again in lexer grammars.
Mike
--
www.soft-gems.net
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list