[antlr-interest] Python target v3 unicode problems
Gavin Lambert
antlr at mirality.co.nz
Fri Sep 14 03:29:19 PDT 2007
At 09:55 14/09/2007, Viðar Svansson wrote:
>Here, the transform function scans, and parses
the string. Before
>I used ANTLRFileStream(path, encoding='utf-8'),
it would fail on
>the scanning. Now the lexer works but the test fails in the end
>with this output:
>
>Failed example:
> tree
>Expected:
> {u'author' : u'Viðar Svansson'}
>Got:
> {u'author': u'Vi\xc3\xb0ar Svansson'}
>
>I am not sure what is wrong here, never seen
seen hex values inside
>a unicode encoding string. Any ideas?
Well, I haven't actually looked it up to see if
it matches, but C3 B0 seems like a double-byte
UTF-8 sequence to me, and thus exactly what it
should be doing, since you told it to deal with UTF-8.
More information about the antlr-interest
mailing list