[antlr-interest] Python target v3 unicode problems

Gavin Lambert antlr at mirality.co.nz
Fri Sep 14 03:29:19 PDT 2007


At 09:55 14/09/2007, Viðar Svansson wrote:
 >Here, the transform function scans, and parses 
the string. Before
 >I used ANTLRFileStream(path, encoding='utf-8'), 
it would fail on
 >the scanning. Now the lexer works but the test fails in the end
 >with this output:
 >
 >Failed example:
 >    tree
 >Expected:
 >    {u'author' : u'Viðar Svansson'}
 >Got:
 >    {u'author': u'Vi\xc3\xb0ar Svansson'}
 >
 >I am not sure what is wrong here, never seen 
seen hex values inside
 >a unicode encoding string. Any ideas?

Well, I haven't actually looked it up to see if 
it matches, but C3 B0 seems like a double-byte 
UTF-8 sequence to me, and thus exactly what it 
should be doing, since you told it to deal with UTF-8.



More information about the antlr-interest mailing list