[antlr-interest] Unicode escapes in C++

Kochismo kochismo at gmail.com
Tue Nov 7 07:36:08 PST 2006


Hi,

I'm interested in parsing a plain ascii file which represents unicode
characters as escaped hex digits.  For example:

blah\uff20\uff30blah

is the string blah,  unicode character #ff20, unicode character #ff30, then
blah.  Recognising it with the lexer is simple enough, but the lexer returns
tokens as C++ strings, rather than unicode friendly wstrings.  Is there a
way I can handle this from within the lexer?  Or will I have to write code
to convert the string token into a wstring?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061107/60854c1e/attachment.html 


More information about the antlr-interest mailing list