[stringtemplate-interest] [ST4] Specify the encoding in the template group file

Terence Parr parrt at cs.usfca.edu
Sun Jan 30 10:24:45 PST 2011


I think I prefer asking the coder to specify the encoding of the file.  UTF-8 won't work for any US machine; encoding is ascii by default. 

Ter

On Jan 29, 2011, at 4:17 AM, Udo Borkowski wrote:

> Hi Ter,
> 
>> Hi. don't we need to know that the encoding is before we can load the file?
> 
> Actually not when we begin loading the file.
> 
> The whole approach is explained in detail in the XML reference documentation. Here the basic idea:
> 
> - Read the first 4 bytes (raw, no encoding needed)
> - Because we know what characters this should be if there is a prolog ("<st(") we can now differentiate between these encodings:
> 	- USC-4
> 	- UTF-16
> 	- UTF-8
> 	(this also covers things like little/big endian, octet order and Byte Order Mark)
> - Once we know this we continue reading in the given encoding until we find the ")>". (All characters in the prolog are in ASCII.)
> - If there is an encoding="…" we now know the exact encoding (e.g. when in UTF-8 mode we may find "ISO-8859-1").
> - The rest of the file is read in the encoding we determined from the prolog.
> 
> If you like I can work out some code for this. Please let me know.
> 
> Udo



More information about the stringtemplate-interest mailing list