[stringtemplate-interest] [ST4] Specify the encoding in the template group file
Terence Parr
parrt at cs.usfca.edu
Sun Jan 30 10:24:45 PST 2011
I think I prefer asking the coder to specify the encoding of the file. UTF-8 won't work for any US machine; encoding is ascii by default.
Ter
On Jan 29, 2011, at 4:17 AM, Udo Borkowski wrote:
> Hi Ter,
>
>> Hi. don't we need to know that the encoding is before we can load the file?
>
> Actually not when we begin loading the file.
>
> The whole approach is explained in detail in the XML reference documentation. Here the basic idea:
>
> - Read the first 4 bytes (raw, no encoding needed)
> - Because we know what characters this should be if there is a prolog ("<st(") we can now differentiate between these encodings:
> - USC-4
> - UTF-16
> - UTF-8
> (this also covers things like little/big endian, octet order and Byte Order Mark)
> - Once we know this we continue reading in the given encoding until we find the ")>". (All characters in the prolog are in ASCII.)
> - If there is an encoding="…" we now know the exact encoding (e.g. when in UTF-8 mode we may find "ISO-8859-1").
> - The rest of the file is read in the encoding we determined from the prolog.
>
> If you like I can work out some code for this. Please let me know.
>
> Udo
More information about the stringtemplate-interest
mailing list