[antlr-interest] Python target v3 unicode problems

Viðar Svansson vidarsvans at gmail.com
Mon Sep 10 09:28:45 PDT 2007


Hi,

I am trying to scan a UTF-8 file using the python target but with no luck.
I usually get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
0: ordinal not in range(128)

I have tried to cast the strings to unicode everywhere but I nothing
seems to work. I have also tried many different declarations of the
unicode tokens but nothing seems to work there either.

I found reference to something like this:

class L extends Lexer;
options {
	charVocabulary = '\3'..'\377' | '\u1000'..'\u1fff';
}

But I think this is v2 syntax, correct?
Does anyone have a working unicode lexer in python?

Viðar


More information about the antlr-interest mailing list