[antlr-interest] Python target v3 unicode problems
Viðar Svansson
vidarsvans at gmail.com
Mon Sep 10 09:28:45 PDT 2007
Hi,
I am trying to scan a UTF-8 file using the python target but with no luck.
I usually get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
0: ordinal not in range(128)
I have tried to cast the strings to unicode everywhere but I nothing
seems to work. I have also tried many different declarations of the
unicode tokens but nothing seems to work there either.
I found reference to something like this:
class L extends Lexer;
options {
charVocabulary = '\3'..'\377' | '\u1000'..'\u1fff';
}
But I think this is v2 syntax, correct?
Does anyone have a working unicode lexer in python?
Viðar
More information about the antlr-interest
mailing list