[antlr-interest] UTF-8, charVocabulary in options in 3.3

Matej Mailing mailing at tam.si
Sun Jul 1 02:44:02 PDT 2012


Hi,

I was able to solve it, Bart, you were right, after double checking
the file it was NOT saved as UTF-8 and now when it is, it works like a
charm!

Thanks!

2012/6/30 Matej Mailing <mailing at tam.si>:
> Hi,
>
> I have edited the input file with Putty in a Linux console and the
> session encoding set to UTF-8. Now I have created the file also with
> Notepad++, also set the encoding to UTF-8 and I have the same
> behaviour. Is there an easy way to print out ANTLRFileStream? I
> suspect that I am looking for the wrong character code in the grammar
> file ...
>
> TIA,
> Matej
>
>
>
>
> 2012/6/29 Bart Kiers <bkiers at gmail.com>:
>> On Fri, Jun 29, 2012 at 12:26 PM, Matej Mailing <mailing at tam.si> wrote:
>>>
>>> Hi,
>>>
>>> I am new to antlr but already have an issue. I have an input file that
>>> contains some UTF-8 characters (like U+0161 -
>>> http://www.fileformat.info/info/unicode/char/161/index.htm) and I am
>>> using ANTLRFileStream(inputfile, "UTF-8") to get the input which is in
>>> UTF-8 as it should be. However, when I do
>>> "RES      : '\u0161' ;"
>>>
>>> it never matches - I get input1 line 1:0 no viable alternative at
>>> character 'š' message.
>>>
>>> When I add the following segment to the grammar file:
>>>
>>> "options
>>> {
>>>           charVocabulary='\u0000'..'\uFFFE';
>>> }"
>>>
>>> I get an error:
>>> "internal error:  : java.lang.Error: Error parsing grammar.g: '\uFFFE'
>>> not expected ';'"
>>> ...
>>> error(100): grammar.g:5:24: syntax error: antlr: grammar.g:5:24:
>>> expecting SEMI, found '..'
>>> error(133): grammar.g:3:1: illegal option charVocabulary"
>>>
>>> I have been googling around for quite some time and none of the
>>> solutions seems to be working. What am I doing wrong?
>>>
>>
>> charVocabulary is an (old) ANTLR v2 option, ANTLR v3 doesn't need it: v3
>> accepts the range 0x0000..0xFFFF by default. So remove the option
>> charVocabular.
>>
>> My guess is that you didn't safe the input file containing 0x0161 properly
>> (I'm guessing it's saved as plain ASCII). Make sure you safe it as
>> Unicode/UTF-xx
>>
>> Regards,
>>
>> Bart.
>>


More information about the antlr-interest mailing list