[stringtemplate-interest] [ST4] Specify the encoding in the template group file

Terence Parr parrt at cs.usfca.edu
Sun Jan 30 10:32:53 PST 2011


yeah, sounds familiar; hmm...maybe UTF-8 default encoding could work.  When I ship a system, though, I am shipping files from my dev box so they should be same across platforms since they don't chnage as I copy them.  encoding only matters if you have stuff in multiple languages in which case you might need multiple encodings per application.  You'll have to specify it anyway.

UTF-8 might be harmless though...

Ter
On Jan 30, 2011, at 10:29 AM, Jim Idle wrote:

> Ter,
> 
> UTF-8 and ASCII are the same thing when only those code points in the
> ASCII spec are considered. This is deliberate for exactly this reason :-).
> Making the default encoding be UTF-8 is therefore exactly equivalent for a
> pure 8 bit ASCII encoding and is safe to assume. It is the default for
> just about everything these days.
> 
> Jim
> 
>> -----Original Message-----
>> From: stringtemplate-interest-bounces at antlr.org [mailto:stringtemplate-
>> interest-bounces at antlr.org] On Behalf Of Terence Parr
>> Sent: Sunday, January 30, 2011 10:25 AM
>> To: stringtemplate-interest at antlr.org List
>> Subject: Re: [stringtemplate-interest] [ST4] Specify the encoding in
>> the template group file
>> 
>> I think I prefer asking the coder to specify the encoding of the file.
>> UTF-8 won't work for any US machine; encoding is ascii by default.
>> 
>> Ter
>> 
>> On Jan 29, 2011, at 4:17 AM, Udo Borkowski wrote:
>> 
>>> Hi Ter,
>>> 
>>>> Hi. don't we need to know that the encoding is before we can load
>> the file?
>>> 
>>> Actually not when we begin loading the file.
>>> 
>>> The whole approach is explained in detail in the XML reference
>> documentation. Here the basic idea:
>>> 
>>> - Read the first 4 bytes (raw, no encoding needed)
>>> - Because we know what characters this should be if there is a prolog
>> ("<st(") we can now differentiate between these encodings:
>>> 	- USC-4
>>> 	- UTF-16
>>> 	- UTF-8
>>> 	(this also covers things like little/big endian, octet order and
>> Byte Order Mark)
>>> - Once we know this we continue reading in the given encoding until
>> we find the ")>". (All characters in the prolog are in ASCII.)
>>> - If there is an encoding="." we now know the exact encoding (e.g.
>> when in UTF-8 mode we may find "ISO-8859-1").
>>> - The rest of the file is read in the encoding we determined from the
>> prolog.
>>> 
>>> If you like I can work out some code for this. Please let me know.
>>> 
>>> Udo
>> 
>> _______________________________________________
>> stringtemplate-interest mailing list
>> stringtemplate-interest at antlr.org
>> http://www.antlr.org/mailman/listinfo/stringtemplate-interest
> _______________________________________________
> stringtemplate-interest mailing list
> stringtemplate-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/stringtemplate-interest



More information about the stringtemplate-interest mailing list