[antlr-interest] Python target v3 unicode problems
Viðar Svansson
vidarsvans at gmail.com
Thu Sep 13 14:55:43 PDT 2007
Hi,
On 9/13/07, Benjamin Niemann <pink at odahoda.de> wrote:
> You must feed unicode data into the lexer. So if you are using
> ANTLRFileStream, use something like
> ANTLRFileStream(path, encoding='utf-8')
I tried this, now I can successfully load the strings thanks. However,
they seem to be somehow wrong after the parse. Here is my doctest:
>>> unicode_str = u'author : "Viðar Svansson" ; '
>>> tree = transform(unicode_str,SymbolTable(), Decorator, Linker)
>>> tree
{u'author' : u'Viðar Svansson'}
Here, the transform function scans, and parses the string. Before I
used ANTLRFileStream(path, encoding='utf-8'), it would fail on the
scanning. Now the lexer works but the test fails in the end with this
output:
Failed example:
tree
Expected:
{u'author' : u'Viðar Svansson'}
Got:
{u'author': u'Vi\xc3\xb0ar Svansson'}
I am not sure what is wrong here, never seen seen hex values inside a
unicode encoding string. Any ideas?
Viðar
More information about the antlr-interest
mailing list