[antlr-interest] Problem parsing unit symbols

Jim Idle jimi at temporal-wave.com
Fri Nov 6 09:55:26 PST 2009



> -----Original Message-----
> From: Mark van Assem [mailto:mark at cs.vu.nl]
> Sent: Friday, November 06, 2009 4:19 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Problem parsing unit symbols
> 
> Hi Jim,
> 
> > So either the lexer specs are incorrect, or the characters you pasted
> here are not in an encoding that matches what Java is looking for. Send
> them in UTF8 format. The UTF8 version of Ohm is 0xE2 0x84 0xA6 for
> instance. What encoding are you sending in? When you come to read input
> files, then you will need to tell the file stream what the file
> encoding is.
> 
> How can I accomplish this? 

New AntlrFileStream(x, "UTF8");

> E.g. notepad allows to save a file in UTF8,
> but how do I get the right character ecodings in? If I e.g. copy them
> from a website this won't work of course.

Web pages are usually in UTF-8, so it probably would, but cutting and pasting mangles it. I use vim/gvim myself.

> 
> In your second mail you say that you "hacked ANTRLworks to to set UTF8
> encoding on file input rather than default and your example stuff
> works". This sounds like something that is useful for many people and
> me
> in particular. Can I somehow get this new version?

It is only reading files of course, but you can download the source tarball for ANTLRWorks and just change the template that generates the driver stub. You can see the change here:

http://fisheye2.atlassian.com/browse/antlrworks

You need Maven to build it but one installed you just type:

mvn

And a new jar is made under the target directory.

Jim





More information about the antlr-interest mailing list