[stringtemplate-interest] UTF-8 not displaying correctly

Leo R. Lundgren leo at finalresort.org
Mon Mar 15 15:48:06 PDT 2010


Hi,

I found StringTemplateGroup. setFileCharEncoding() which takes a  
parameter that seems to be the same as  
java.io.InputStreamReader(InputStream in, String charsetName) accepts.  
I added it to my ViewHandler:

	public class ViewHandler {
		private StringTemplateGroup templateGroup;
		private Map<String, String> attributes = new HashMap<String,  
String>();

		public ViewHandler(String viewBasePath) {
			templateGroup = new StringTemplateGroup("default", viewBasePath);
			System.out.println(templateGroup.getFileCharEncoding());
			templateGroup.setFileCharEncoding("UTF-8");
			System.out.println(templateGroup.getFileCharEncoding());
		}

		public void setAttribute(String name, String value) {
			attributes.put(name, value);
		}

		public String getOutput(String viewName) {
			StringTemplate view = templateGroup.getInstanceOf(viewName,  
attributes);
			return view.toString();
		}

		public void render(Writer out, String viewName) throws IOException {
			out.write(getOutput(viewName));
		}
	}

Watching the console at the time of a request, it seems that UTF-8 is  
already the default in the system. In any case, that is what the  
option is set to. Still no go in the output however, the encoding  
issue remains.

I have checked all encoding settings for the files properties and they  
all say UTF-8 (inherited from container).
I also tried templateGroup.setFileCharEncoding("ISO-8859-1") instead,  
and it did change the <?> to a couple of junk characters instead, so  
it's not right.
I'd also like to clarify that my previous information regarding the  
HTTP response headers carrying a charset in them was wrong; there is  
no such header sent. However, the browser adheres to the HTML meta tag  
defining a charset, that I am sure of.

After some testing, I've found that there is /one/ thing that makes  
the page display correctly; If in the HTML of the template I set the  
charset to iso-8859-1 instead of utf-8, so that the browser parses the  
contents as latin1, it displays correctly. I can't really draw any  
other conclusion from this than that what the browser is sent is coded  
as latin1?

At http://www.stringtemplate.org/api/org/antlr/stringtemplate/PathGroupLoader.html 
  I found the description "A brain dead loader that looks only in the  
directory(ies) you specify in the ctor. You may specify the char  
encoding. NOTE: this does not work when you jar things up! Use  
CommonGroupLoader instead in that case".

Reading the note in the description, and also reading http://www.stringtemplate.org/api/org/antlr/stringtemplate/CommonGroupLoader.html 
  , I get the feeling that it's not the actual char encoding that  
doesn't work when "jar'ed up", but rather the loader class itself. But  
is this something I should try anyway? If so, how do I use the group  
loader?

I did check with some Eclipse guys and they didn't feel that it was  
Eclipse not saving files correctly. Personally, I don't know since I  
havent used Eclipse long enough to form an opinion based on experience  
in it.

Silly question maybe, but could it be that ST just *reads* the  
template files using UTF-8 (or the set encoding), but then outputs it  
using Latin1?

For reference, here's the beginning of the index HTML template:

	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd 
">
	<html lang="sv-SE">
		<head>
			<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
			<title>MyApp</title>
			<link rel="stylesheet" type="text/css" href="css/common.css">
		</head>
		<body>
			å ä ö <!-- test characters -->
			$(contentTemplate)()$
		</body>
	</html>

Many thanks,

Regards, Leo


15 mar 2010 kl. 19.50 skrev Terence Parr:

> Hi. You have to tell ST to use a UTF-8 encoding. should be option to  
> StringTemplateGroup or something.
> Ter
> On Mar 15, 2010, at 10:11 AM, Leo R. Lundgren wrote:
>
>> Hi,
>>
>> I am building a small servlet application using Eclipse, Tomcat 6,  
>> JRE
>> 1.6, ST 3.2. Here is a ViewHandler I'm using to wrap ST  
>> functionality:
>>
>> 	public class ViewHandler {
>> 		private StringTemplateGroup templateGroup;
>> 		private Map<String, String> attributes = new HashMap<String,
>> String>();
>>
>> 		public ViewHandler(String viewBasePath) {
>> 			templateGroup = new StringTemplateGroup("default", viewBasePath);
>> 		}
>>
>> 		public void setAttribute(String name, String value) {
>> 			attributes.put(name, value);
>> 		}
>>
>> 		public String getOutput(String viewName) {
>> 			StringTemplate view = templateGroup.getInstanceOf(viewName);
>> 			view.setAttributes(attributes);
>> 			return view.toString();
>> 		}
>>
>> 		public void render(Writer out, String viewName) throws  
>> IOException {
>> 			out.write(getOutput(viewName));
>> 		}
>> 	}
>>
>> The handler is used like this in a servlet:
>>
>> 	protected void doGet(HttpServletRequest request, HttpServletResponse
>> response) throws ServletException, IOException {
>> 		super.doGet(request, response);
>> 		
>> 		String viewBasePath = getServletContext().getRealPath("/WEB-INF/
>> view");
>> 		ViewHandler viewHandler = new ViewHandler(viewBasePath);
>> 		viewHandler.setAttribute("fileName", "test.png");
>> 		viewHandler.setAttribute("contentTemplate", "uploadFile");
>>
>> 		viewHandler.render(response.getWriter(), "index");
>> 	}
>>
>> It does what it is supposed to; The output I get is the contents of
>> the index.st template, with attributes replaced like they should be,
>> and the content template included as expected.
>>
>> However, swedish characters such as åäö that are part of static
>> strings in the template files are shown in the browser(s) as question
>> marks. I know this indicates coding/charset problems. An example
>> string (from the template files) that is not displayed correctly is:
>>
>> 	<input type="button" class="cancelUploadButton" value="Avbryt
>> insättning">
>>
>> The 'ä' in the last word becomes a question mark in the browser.
>>
>>
>> So, I have:
>> - Checked the encoding settings in Eclipse, in all places I can find
>> that seem to relate to the source files and/or template files.
>> - Checked the encoding of the related template files (both in their
>> properties and using an external editor that loads them fine as  
>> UTF-8).
>> - Verified that the HTTP response headers say UTF-8 as the charset.
>> The same goes for the HTML code itself, it's UTF-8 all the way.
>>
>> The only thing I haven't found to be apparently fine is when I open
>> the .java files from my project using another editor (TextMate, which
>> has always handled encodings fine for me); Normally TextMate displays
>> the encoding used/discovered from loading the file (for the template
>> files it says UTF-8), but for the Java source files it doesn't  
>> display
>> anything.
>> However there are no static strings in the source files other than
>> template names and attributes, so I'm not sure that would matter. But
>> maybe it does, assuming there's something wrong with how the source
>> files are saved by eclipse.
>>
>> Can someone shed some light on this issue? As I see it I've got UTF-8
>> everywhere (apart from possibly the Java source files, which I guess
>> could be the issue), and it should work. But maybe I need to change
>> something with regards to ST to have it work with UTF-8? If not, any
>> other ideas?
>>
>> Thank you,
>>
>> // Leo
>>
>> _______________________________________________
>> stringtemplate-interest mailing list
>> stringtemplate-interest at antlr.org
>> http://www.antlr.org/mailman/listinfo/stringtemplate-interest
>



-|



More information about the stringtemplate-interest mailing list