[antlr-interest] Unicode escapes in C++
Ric Klaren
ric.klaren at gmail.com
Tue Nov 7 12:06:38 PST 2006
Hi,
Kochismo wrote:
> I'm interested in parsing a plain ascii file which represents unicode
> characters as escaped hex digits. For example:
>
> blah\uff20\uff30blah
>
> is the string blah, unicode character #ff20, unicode character #ff30, then
> blah. Recognising it with the lexer is simple enough, but the lexer
> returns
> tokens as C++ strings, rather than unicode friendly wstrings. Is there a
> way I can handle this from within the lexer? Or will I have to write code
> to convert the string token into a wstring?
You can probably get some inspiration for this from the Unicode C++
example in the distribution. You probably only need to pay attention to
the part where the strings for the tokens are collected.
Cheers,
Ric
More information about the antlr-interest
mailing list