[antlr-interest] python-lang parser to python target
Aaron Maxwell
amax at redsymbol.net
Tue Jul 22 19:32:02 PDT 2008
On Sunday 20 July 2008 02:35:16 am Johannes Luber wrote:
> Benjamin Niemann schrieb:
> > there is a Python grammar in examples package. It's the 2.3 grammar,
> > but you may use parts of both to get a working Python2.5 grammar.
> Regarding the sample grammars: In the repository there are sample
> grammars for these languages as well. Ter is probably planning to update
Hi all,
Benjamin, Johannes, thanks for the advice. Using it, I have a
partial port of the python 2.5 grammar to python language target, for
antlr 3.1:
http://redsymbol.net/files/antlr/Python-python-2.5-2008-07-22.tgz
I'd like to get this in a state where other people can use it. Please
advise of any needed changes you see. I have just tested the parsing,
and not any code generation based on it yet - exactly what I did is
described in the tarball's README.
I tested the resulting parser on the 2034 python files in a recent jython ASM
branch checkout. The README goes into detail - basically, two of those
files triggered errors; all the others parsed without errors, though
at least a few percent had one or more warning.
One big problem I see is that the generated PythonLexer.py has a
dangling elif clause - it prints this:
{{{
elif alt28 == 2:
# Python.g:615:10:
<that's it - no statements in the block>
}}}
Due to Python's block structure by indentation, this is not correct
python syntax - there needs to be a "pass" statement, or the elif
clause needs to be omitted altogether. The offending rule in the
grammar is:
{{{
CONTINUED_LINE
: '\\' ('\r')? '\n' (' '|'\t')* { $channel=HIDDEN; }
( nl=NEWLINE
{self.emit(ClassicToken(type=NEWLINE,text=nl.getText()))}
|
)
;
}}}
I tried removing the emtpy "|" line, like this:
{{{
CONTINUED_LINE
: '\\' ('\r')? '\n' (' '|'\t')* { $channel=HIDDEN; }
( nl=NEWLINE
{self.emit(ClassicToken(type=NEWLINE,text=nl.getText()))}
)
;
}}}
Then the lexer code's syntax is correct. However, the parser then
cannot correctly parse lines that are broken by a backslash (i.e., one
logical line split over two lines) -- for example:
{{{
** ./CPythonLib/plat-sunos5/STROPTS.py
line 836:8 required (...)+ loop did not match anything at character u'('
line 1396:24 required (...)+ loop did not match anything at character u'A'
line 1397:24 required (...)+ loop did not match anything at character u'A'
}}}
Can someone suggest a fix? I tried just putting { pass;} in there,
but it is not placed at the correct indentation level. Plus that is just
hackish.
Any other feedback appreciated.
Thanks,
Aaron
--
Aaron Maxwell
http://redsymbol.net
More information about the antlr-interest
mailing list