[antlr-interest] Changing the Lexer based on parsing the first part of a file for Python 2.6

fwierzbicki at gmail.com fwierzbicki at gmail.com
Tue Feb 14 11:02:46 PST 2012


Hi all,

Python 2.6 has syntax to change lexing behavior. Specifically:

from __future__ import unicode_literals

If this statement is present the lexing of strings changes. Without
this directive,

foo = "bar"

assigns foo a String value. With the __future__ statement, foo gets a
unicode statement. Also the __future__ statement causes

foo = u"bar"

to be an illegal statement. Essentially this allows you to write a 2.x
program that will look more like a Python 3 program.

So my question - what is a reasonable way to get my ANTLR3 grammar to
signal the lexer to change? Though it seems ugly, my first thought is
to pass a reference to the lexer to the parser and just set a boolean
on the lexer so it has the correct behavior from then on. The reason
that this *may* work is that Python only allows "from __future__"
statements at the very top of the file and so no string/unicode/etc
tokens are possible until after all "from __future__" statements have
occurred. Will I get into trouble with cached lexing that has already
happened? Or is there a better way to do this sort of thing?

Kind regards,

-Frank Wierzbicki


More information about the antlr-interest mailing list