[antlr-interest] python-lang parser to python target

Benjamin Niemann pink at odahoda.de
Wed Jul 23 10:18:23 PDT 2008


Hi Aaron,

On Wed, Jul 23, 2008 at 4:32 AM, Aaron Maxwell <amax at redsymbol.net> wrote:
> On Sunday 20 July 2008 02:35:16 am Johannes Luber wrote:
>> Benjamin Niemann schrieb:
>> > there is a Python grammar in examples package. It's the 2.3 grammar,
>> > but you may use parts of both to get a working Python2.5 grammar.
>
>> Regarding the sample grammars: In the repository there are sample
>> grammars for these languages as well. Ter is probably planning to update
>
> Hi all,
>
> Benjamin, Johannes, thanks for the advice.  Using it, I have a
> partial port of the python 2.5 grammar to python language target, for
> antlr 3.1:
>
> http://redsymbol.net/files/antlr/Python-python-2.5-2008-07-22.tgz

Cool! I'll have a look at it later.

> One big problem I see is that the generated PythonLexer.py has a
> dangling elif clause - it prints this:
>
> {{{
>            elif alt28 == 2:
>                # Python.g:615:10:
> <that's it - no statements in the block>
> }}}
>
> Due to Python's block structure by indentation, this is not correct
> python syntax - there needs to be a "pass" statement, or the elif
> clause needs to be omitted altogether.  The offending rule in the
> grammar is:
>
> {{{
> CONTINUED_LINE
>    :    '\\' ('\r')? '\n' (' '|'\t')*  { $channel=HIDDEN; }
>         ( nl=NEWLINE
> {self.emit(ClassicToken(type=NEWLINE,text=nl.getText()))}
>         |
>         )
>    ;
> }}}
>
> I tried removing the emtpy "|" line, like this:
>
> {{{
> CONTINUED_LINE
>    :    '\\' ('\r')? '\n' (' '|'\t')*  { $channel=HIDDEN; }
>         ( nl=NEWLINE
> {self.emit(ClassicToken(type=NEWLINE,text=nl.getText()))}
>         )
>    ;
> }}}
>
> Then the lexer code's syntax is correct.  However, the parser then
> cannot correctly parse lines that are broken by a backslash (i.e., one
> logical line split over two lines) -- for example:
> {{{
> ** ./CPythonLib/plat-sunos5/STROPTS.py
> line 836:8 required (...)+ loop did not match anything at character u'('
> line 1396:24 required (...)+ loop did not match anything at character u'A'
> line 1397:24 required (...)+ loop did not match anything at character u'A'
>
> }}}
>
> Can someone suggest a fix?  I tried just putting { pass;} in there,
> but it is not placed at the correct indentation level.  Plus that is just
> hackish.

Mmm.. that should work. This is the only way to get empty alternatives
to work right now, and I've used it several times without problems.
I'll have a look at the code and checkout why it doesn't work in this
case - but probably not before the weekend.

-Ben


More information about the antlr-interest mailing list