[antlr-interest] Bug in Python target while using multiple lexers + island grammar
Bob Adolf
rdadolf at gmail.com
Tue Mar 30 15:30:06 PDT 2010
I probably should've posted this a while ago, but I didn't get around
to it.
There is a bug in the CommonTokenStream.getTokens() function in the
Python target version 3.1.2 (it looks like there is a 3.2 version in
the bug database, but 3.1.2 is the released python runtime and it
shouldn't affect the bug anyways). The array reference that selects
which tokens to return uses this:
self.tokens[start:stop]
which drops the last token. My guess is that in normal cases, this is
overlooked because the last token is EOF, and if you're calling
getTokens() after the fact, EOF has already served its primary purpose
and terminated the lexer. A cursory look at the java code makes me
think that it won't have this problem, but I have not test it. A port
of the included reproducer could answer that.
The workaround is to use the tokens list inside the CommonTokenStream
class directly instead of getTokens().
On a side note, there is also a bug (sort of) in the code given on the
wiki page for emitting multiple tokens (http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497
). The method proposed builds up an array of tokens and then emits
them one by one as it continues to go through the file. This is fine
for non-island grammars, but if you use multiple-emit inside an island
grammar, the lexer will happily continue munching input as it cleans
out its emit buffer even after the EOF token is "emitted". This either
leads to the island lexer throwing away input (since it terminates on
EOF and tosses the remaining multi-emit buffer) or throwing an error
(if it runs across input that it cannot understand).
I've included a reproducer (in python) which can demonstrate and gives
a workaround for both.
Thanks,
-Bob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BUG_python_island_grammar.tgz
Type: application/octet-stream
Size: 1918 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100330/f9738592/attachment.obj
-------------- next part --------------
More information about the antlr-interest
mailing list