[antlr-interest] Python target v3 unicode problems

Thu Sep 13 14:55:43 PDT 2007

Hi,

On 9/13/07, Benjamin Niemann <pink at odahoda.de> wrote:
> You must feed unicode data into the lexer. So if you are using
> ANTLRFileStream, use something like
>   ANTLRFileStream(path, encoding='utf-8')

I tried this, now I can successfully load the strings thanks. However,
they seem to be somehow wrong after the parse. Here is my doctest:

    >>> unicode_str = u'author : "Viðar Svansson" ; '
    >>> tree = transform(unicode_str,SymbolTable(), Decorator, Linker)
    >>> tree
    {u'author' : u'Viðar Svansson'}

Here, the transform function scans, and parses the string. Before I
used ANTLRFileStream(path, encoding='utf-8'), it would fail on the
scanning. Now the lexer works but the test fails in the end with this
output:

Failed example:
    tree
Expected:
    {u'author' : u'Viðar Svansson'}
Got:
    {u'author': u'Vi\xc3\xb0ar Svansson'}

I am not sure what is wrong here, never seen seen hex values inside a
unicode encoding string. Any ideas?

Viðar