[antlr-interest] UTF-8, charVocabulary in options in 3.3

Fri Jun 29 05:06:13 PDT 2012

On Fri, Jun 29, 2012 at 12:26 PM, Matej Mailing <mailing at tam.si> wrote:

> Hi,
>
> I am new to antlr but already have an issue. I have an input file that
> contains some UTF-8 characters (like U+0161 -
> http://www.fileformat.info/info/unicode/char/161/index.htm) and I am
> using ANTLRFileStream(inputfile, "UTF-8") to get the input which is in
> UTF-8 as it should be. However, when I do
> "RES      : '\u0161' ;"
>
> it never matches - I get input1 line 1:0 no viable alternative at
> character 'š' message.
>
> When I add the following segment to the grammar file:
>
> "options
> {
>           charVocabulary='\u0000'..'\uFFFE';
> }"
>
> I get an error:
> "internal error:  : java.lang.Error: Error parsing grammar.g: '\uFFFE'
> not expected ';'"
> ...
> error(100): grammar.g:5:24: syntax error: antlr: grammar.g:5:24:
> expecting SEMI, found '..'
> error(133): grammar.g:3:1: illegal option charVocabulary"
>
> I have been googling around for quite some time and none of the
> solutions seems to be working. What am I doing wrong?
>
>
charVocabulary is an (old) ANTLR v2 option, ANTLR v3 doesn't need it: v3
accepts the range 0x0000..0xFFFF by default. So remove the option
charVocabular.

My guess is that you didn't safe the input file containing 0x0161 properly
(I'm guessing it's saved as plain ASCII). Make sure you safe it as
Unicode/UTF-xx

Regards,

Bart.