[antlr-interest] UTF-8, charVocabulary in options in 3.3

Matej Mailing mailing at tam.si
Sat Jun 30 02:54:14 PDT 2012


I have edited the input file with Putty in a Linux console and the
session encoding set to UTF-8. Now I have created the file also with
Notepad++, also set the encoding to UTF-8 and I have the same
behaviour. Is there an easy way to print out ANTLRFileStream? I
suspect that I am looking for the wrong character code in the grammar
file ...


2012/6/29 Bart Kiers <bkiers at gmail.com>:
> On Fri, Jun 29, 2012 at 12:26 PM, Matej Mailing <mailing at tam.si> wrote:
>> Hi,
>> I am new to antlr but already have an issue. I have an input file that
>> contains some UTF-8 characters (like U+0161 -
>> http://www.fileformat.info/info/unicode/char/161/index.htm) and I am
>> using ANTLRFileStream(inputfile, "UTF-8") to get the input which is in
>> UTF-8 as it should be. However, when I do
>> "RES      : '\u0161' ;"
>> it never matches - I get input1 line 1:0 no viable alternative at
>> character 'š' message.
>> When I add the following segment to the grammar file:
>> "options
>> {
>>           charVocabulary='\u0000'..'\uFFFE';
>> }"
>> I get an error:
>> "internal error:  : java.lang.Error: Error parsing grammar.g: '\uFFFE'
>> not expected ';'"
>> ...
>> error(100): grammar.g:5:24: syntax error: antlr: grammar.g:5:24:
>> expecting SEMI, found '..'
>> error(133): grammar.g:3:1: illegal option charVocabulary"
>> I have been googling around for quite some time and none of the
>> solutions seems to be working. What am I doing wrong?
> charVocabulary is an (old) ANTLR v2 option, ANTLR v3 doesn't need it: v3
> accepts the range 0x0000..0xFFFF by default. So remove the option
> charVocabular.
> My guess is that you didn't safe the input file containing 0x0161 properly
> (I'm guessing it's saved as plain ASCII). Make sure you safe it as
> Unicode/UTF-xx
> Regards,
> Bart.

More information about the antlr-interest mailing list