[stringtemplate-interest] [ST4] Specify the encoding in the template group file
Terence Parr
parrt at cs.usfca.edu
Sun Jan 30 10:32:53 PST 2011
yeah, sounds familiar; hmm...maybe UTF-8 default encoding could work. When I ship a system, though, I am shipping files from my dev box so they should be same across platforms since they don't chnage as I copy them. encoding only matters if you have stuff in multiple languages in which case you might need multiple encodings per application. You'll have to specify it anyway.
UTF-8 might be harmless though...
Ter
On Jan 30, 2011, at 10:29 AM, Jim Idle wrote:
> Ter,
>
> UTF-8 and ASCII are the same thing when only those code points in the
> ASCII spec are considered. This is deliberate for exactly this reason :-).
> Making the default encoding be UTF-8 is therefore exactly equivalent for a
> pure 8 bit ASCII encoding and is safe to assume. It is the default for
> just about everything these days.
>
> Jim
>
>> -----Original Message-----
>> From: stringtemplate-interest-bounces at antlr.org [mailto:stringtemplate-
>> interest-bounces at antlr.org] On Behalf Of Terence Parr
>> Sent: Sunday, January 30, 2011 10:25 AM
>> To: stringtemplate-interest at antlr.org List
>> Subject: Re: [stringtemplate-interest] [ST4] Specify the encoding in
>> the template group file
>>
>> I think I prefer asking the coder to specify the encoding of the file.
>> UTF-8 won't work for any US machine; encoding is ascii by default.
>>
>> Ter
>>
>> On Jan 29, 2011, at 4:17 AM, Udo Borkowski wrote:
>>
>>> Hi Ter,
>>>
>>>> Hi. don't we need to know that the encoding is before we can load
>> the file?
>>>
>>> Actually not when we begin loading the file.
>>>
>>> The whole approach is explained in detail in the XML reference
>> documentation. Here the basic idea:
>>>
>>> - Read the first 4 bytes (raw, no encoding needed)
>>> - Because we know what characters this should be if there is a prolog
>> ("<st(") we can now differentiate between these encodings:
>>> - USC-4
>>> - UTF-16
>>> - UTF-8
>>> (this also covers things like little/big endian, octet order and
>> Byte Order Mark)
>>> - Once we know this we continue reading in the given encoding until
>> we find the ")>". (All characters in the prolog are in ASCII.)
>>> - If there is an encoding="." we now know the exact encoding (e.g.
>> when in UTF-8 mode we may find "ISO-8859-1").
>>> - The rest of the file is read in the encoding we determined from the
>> prolog.
>>>
>>> If you like I can work out some code for this. Please let me know.
>>>
>>> Udo
>>
>> _______________________________________________
>> stringtemplate-interest mailing list
>> stringtemplate-interest at antlr.org
>> http://www.antlr.org/mailman/listinfo/stringtemplate-interest
> _______________________________________________
> stringtemplate-interest mailing list
> stringtemplate-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/stringtemplate-interest
More information about the stringtemplate-interest
mailing list