[stringtemplate-interest] UTF-8 not displaying correctly

Leo R. Lundgren leo at finalresort.org
Mon Mar 15 15:58:47 PDT 2010


To clarify the combination of  
StringTemplateGroup.setFileCharEncoding() and HTML meta charset I have  
tried, and their results:

	setFileCharEncoding("UTF-8")
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	Result: Characters NOT displayed correctly in browser ("<?>"-sign/ 
character displayed where there should be one "ä").

	setFileCharEncoding("ISO-8859-1")
	<meta http-equiv="Content-Type" content="text/html;  
charset=iso-8859-1">
	Result: Characters NOT displayed correctly in browser (dual junk  
chars where there should be one "ä").

	setFileCharEncoding("UTF-8")
	<meta http-equiv="Content-Type" content="text/html;  
charset=iso-8859-1">
	Result: Characters displayed correctly in browser.

	setFileCharEncoding("ISO-8859-1")
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	Result: Characters displayed correctly in browser.

Pretty confusing.

// Leo


15 mar 2010 kl. 23.48 skrev Leo R. Lundgren:

> Hi,
>
> I found StringTemplateGroup. setFileCharEncoding() which takes a
> parameter that seems to be the same as
> java.io.InputStreamReader(InputStream in, String charsetName) accepts.
> I added it to my ViewHandler:
>
> 	public class ViewHandler {
> 		private StringTemplateGroup templateGroup;
> 		private Map<String, String> attributes = new HashMap<String,
> String>();
>
> 		public ViewHandler(String viewBasePath) {
> 			templateGroup = new StringTemplateGroup("default", viewBasePath);
> 			System.out.println(templateGroup.getFileCharEncoding());
> 			templateGroup.setFileCharEncoding("UTF-8");
> 			System.out.println(templateGroup.getFileCharEncoding());
> 		}
>
> 		public void setAttribute(String name, String value) {
> 			attributes.put(name, value);
> 		}
>
> 		public String getOutput(String viewName) {
> 			StringTemplate view = templateGroup.getInstanceOf(viewName,
> attributes);
> 			return view.toString();
> 		}
>
> 		public void render(Writer out, String viewName) throws IOException {
> 			out.write(getOutput(viewName));
> 		}
> 	}
>
> Watching the console at the time of a request, it seems that UTF-8 is
> already the default in the system. In any case, that is what the
> option is set to. Still no go in the output however, the encoding
> issue remains.
>
> I have checked all encoding settings for the files properties and they
> all say UTF-8 (inherited from container).
> I also tried templateGroup.setFileCharEncoding("ISO-8859-1") instead,
> and it did change the <?> to a couple of junk characters instead, so
> it's not right.
> I'd also like to clarify that my previous information regarding the
> HTTP response headers carrying a charset in them was wrong; there is
> no such header sent. However, the browser adheres to the HTML meta tag
> defining a charset, that I am sure of.
>
> After some testing, I've found that there is /one/ thing that makes
> the page display correctly; If in the HTML of the template I set the
> charset to iso-8859-1 instead of utf-8, so that the browser parses the
> contents as latin1, it displays correctly. I can't really draw any
> other conclusion from this than that what the browser is sent is coded
> as latin1?
>
> At http://www.stringtemplate.org/api/org/antlr/stringtemplate/PathGroupLoader.html
>  I found the description "A brain dead loader that looks only in the
> directory(ies) you specify in the ctor. You may specify the char
> encoding. NOTE: this does not work when you jar things up! Use
> CommonGroupLoader instead in that case".
>
> Reading the note in the description, and also reading http://www.stringtemplate.org/api/org/antlr/stringtemplate/CommonGroupLoader.html
>  , I get the feeling that it's not the actual char encoding that
> doesn't work when "jar'ed up", but rather the loader class itself. But
> is this something I should try anyway? If so, how do I use the group
> loader?
>
> I did check with some Eclipse guys and they didn't feel that it was
> Eclipse not saving files correctly. Personally, I don't know since I
> havent used Eclipse long enough to form an opinion based on experience
> in it.
>
> Silly question maybe, but could it be that ST just *reads* the
> template files using UTF-8 (or the set encoding), but then outputs it
> using Latin1?
>
> For reference, here's the beginning of the index HTML template:
>
> 	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd
> ">
> 	<html lang="sv-SE">
> 		<head>
> 			<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
> 			<title>MyApp</title>
> 			<link rel="stylesheet" type="text/css" href="css/common.css">
> 		</head>
> 		<body>
> 			å ä ö <!-- test characters -->
> 			$(contentTemplate)()$
> 		</body>
> 	</html>
>
> Many thanks,
>
> Regards, Leo
>
>
> 15 mar 2010 kl. 19.50 skrev Terence Parr:
>
>> Hi. You have to tell ST to use a UTF-8 encoding. should be option to
>> StringTemplateGroup or something.
>> Ter
>> On Mar 15, 2010, at 10:11 AM, Leo R. Lundgren wrote:
>>
>>> Hi,
>>>
>>> I am building a small servlet application using Eclipse, Tomcat 6,
>>> JRE
>>> 1.6, ST 3.2. Here is a ViewHandler I'm using to wrap ST
>>> functionality:
>>>
>>> 	public class ViewHandler {
>>> 		private StringTemplateGroup templateGroup;
>>> 		private Map<String, String> attributes = new HashMap<String,
>>> String>();
>>>
>>> 		public ViewHandler(String viewBasePath) {
>>> 			templateGroup = new StringTemplateGroup("default", viewBasePath);
>>> 		}
>>>
>>> 		public void setAttribute(String name, String value) {
>>> 			attributes.put(name, value);
>>> 		}
>>>
>>> 		public String getOutput(String viewName) {
>>> 			StringTemplate view = templateGroup.getInstanceOf(viewName);
>>> 			view.setAttributes(attributes);
>>> 			return view.toString();
>>> 		}
>>>
>>> 		public void render(Writer out, String viewName) throws
>>> IOException {
>>> 			out.write(getOutput(viewName));
>>> 		}
>>> 	}
>>>
>>> The handler is used like this in a servlet:
>>>
>>> 	protected void doGet(HttpServletRequest request,  
>>> HttpServletResponse
>>> response) throws ServletException, IOException {
>>> 		super.doGet(request, response);
>>> 		
>>> 		String viewBasePath = getServletContext().getRealPath("/WEB-INF/
>>> view");
>>> 		ViewHandler viewHandler = new ViewHandler(viewBasePath);
>>> 		viewHandler.setAttribute("fileName", "test.png");
>>> 		viewHandler.setAttribute("contentTemplate", "uploadFile");
>>>
>>> 		viewHandler.render(response.getWriter(), "index");
>>> 	}
>>>
>>> It does what it is supposed to; The output I get is the contents of
>>> the index.st template, with attributes replaced like they should be,
>>> and the content template included as expected.
>>>
>>> However, swedish characters such as åäö that are part of static
>>> strings in the template files are shown in the browser(s) as  
>>> question
>>> marks. I know this indicates coding/charset problems. An example
>>> string (from the template files) that is not displayed correctly is:
>>>
>>> 	<input type="button" class="cancelUploadButton" value="Avbryt
>>> insättning">
>>>
>>> The 'ä' in the last word becomes a question mark in the browser.
>>>
>>>
>>> So, I have:
>>> - Checked the encoding settings in Eclipse, in all places I can find
>>> that seem to relate to the source files and/or template files.
>>> - Checked the encoding of the related template files (both in their
>>> properties and using an external editor that loads them fine as
>>> UTF-8).
>>> - Verified that the HTTP response headers say UTF-8 as the charset.
>>> The same goes for the HTML code itself, it's UTF-8 all the way.
>>>
>>> The only thing I haven't found to be apparently fine is when I open
>>> the .java files from my project using another editor (TextMate,  
>>> which
>>> has always handled encodings fine for me); Normally TextMate  
>>> displays
>>> the encoding used/discovered from loading the file (for the template
>>> files it says UTF-8), but for the Java source files it doesn't
>>> display
>>> anything.
>>> However there are no static strings in the source files other than
>>> template names and attributes, so I'm not sure that would matter.  
>>> But
>>> maybe it does, assuming there's something wrong with how the source
>>> files are saved by eclipse.
>>>
>>> Can someone shed some light on this issue? As I see it I've got  
>>> UTF-8
>>> everywhere (apart from possibly the Java source files, which I guess
>>> could be the issue), and it should work. But maybe I need to change
>>> something with regards to ST to have it work with UTF-8? If not, any
>>> other ideas?
>>>
>>> Thank you,
>>>
>>> // Leo
>>>
>>> _______________________________________________
>>> stringtemplate-interest mailing list
>>> stringtemplate-interest at antlr.org
>>> http://www.antlr.org/mailman/listinfo/stringtemplate-interest
>>
>
>
>
> -|
>
> _______________________________________________
> stringtemplate-interest mailing list
> stringtemplate-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/stringtemplate-interest



-|



More information about the stringtemplate-interest mailing list