[antlr-interest] misunderstanding channel HIDDEN
Gavin Lambert
antlr at mirality.co.nz
Wed Aug 26 14:07:26 PDT 2009
At 06:13 27/08/2009, Ian Eyberg wrote:
>I have text that looks like:
>
> 'b^@l^@a^@h^@'
>
>(most of the time the text is simply 'blah')
>and then it should come out like this:
>
> 'blah'
[...]
> UCODE : '\u0000'{ $channel = HIDDEN; };
>
>I'm reading in through antlrinputstream as "UTF8" as I do
>want to support multi-byte chars and I have rules to help
>that such as:
I think you're going about this the wrong way. The input above
looks like UTF-16; you should detect that case and use a UTF16
file stream instead of a UTF8 one. (Normally Unicode files will
start with a BOM you can use for auto-detection.)
UTF-16 and UTF-8 encode high-order Unicode characters quite
differently, so if your input can include them then trying to read
it as UTF8 and just throwing away the nulls definitely isn't going
to work.
More information about the antlr-interest
mailing list