[antlr-interest] Bug in C runtime column handling
Wincent Colaiuta
win at wincent.com
Mon Jul 2 06:21:07 PDT 2007
I posted about this issue previously (<http://www.antlr.org/
pipermail/antlr-interest/2007-June/021331.html>) but while unit
testing a parser built with the ANTLR 3 C target I discovered that
the behaviour is a little more inconsistent than I had initially
thought. CC'ing you directly on this one Jim seeing as I don't know
if you saw my post on the topic last time.
BACKGROUND:
In lexer rules it can be handy to use a predicate to match certain
tokens only when they appear in a certain column (often the first
column). The C runtime provides a getCharPositionInLine() function
that can be used to access the current column information.
Unfortunately, as described in my previous post, the value returned
by this function will be -1 at the start of the input, unlike the
Java runtime where it is 0. This is because in the antlr3InputReset()
function in antlr3inputstream.c the charPositionInLine is explicitly
set to -1.
As noted earlier, I worked around this by using a helper method in my
lexer:
ANTLR3_UINT32 ANTLR3_INLINE char_position_in_line(pWikiTextLexer ctx)
{
ANTLR3_UINT32 pos = ctx->pLexer->getCharPositionInLine(ctx-
>pLexer);
return pos == -1 ? 0 : pos;
}
But I just discovered that this still won't work consistently on the
first line of the input. This is because after scanning the first
character on the first line the column number will be bumped up by 1
only. This means that the column numbering is off by one for every
character in the first line; all the other lines are correct.
To illustrate, given input like the following:
foobar
foobar
On the first line "f" will be at column -1, "o" will be at column 0,
and so on for columns 1, 2, 3, 4.
On the second line "f" will be at column 0, "o" at column 1, and so
on for columns 2, 3, 4, 5.
As this is inconsistent it could catch people out and so looks like a
bug to me.
SOLUTION:
So to work around this bug the helper method now needs to be updated
to look at the line number first in order to figure out whether
adjusting the column number is necessary or not. Alternatively, the
following patch can be applied to the runtime (antlr3inputstream.c):
166c166
< input->charPositionInLine = -1;
---
> input->charPositionInLine = 0;
Cheers,
Wincent
More information about the antlr-interest
mailing list