[antlr-interest] Bug in C runtime column handling
Jim Idle
jimi at temporal-wave.com
Mon Jul 2 09:09:12 PDT 2007
It was -1 in the Java at some point and I think Ter changed it. I have
not had time to work out the implications of this because the input
stuff tests for -1 in places I think. I will get to it.
Jim
> -----Original Message-----
> From: Wincent Colaiuta [mailto:win at wincent.com]
> Sent: Monday, July 02, 2007 6:21 AM
> To: ANTLR Interest
> Cc: Jim Idle
> Subject: Bug in C runtime column handling
>
> I posted about this issue previously (<http://www.antlr.org/
> pipermail/antlr-interest/2007-June/021331.html>) but while unit
> testing a parser built with the ANTLR 3 C target I discovered that
> the behaviour is a little more inconsistent than I had initially
> thought. CC'ing you directly on this one Jim seeing as I don't know
> if you saw my post on the topic last time.
>
> BACKGROUND:
>
> In lexer rules it can be handy to use a predicate to match certain
> tokens only when they appear in a certain column (often the first
> column). The C runtime provides a getCharPositionInLine() function
> that can be used to access the current column information.
> Unfortunately, as described in my previous post, the value returned
> by this function will be -1 at the start of the input, unlike the
> Java runtime where it is 0. This is because in the antlr3InputReset()
> function in antlr3inputstream.c the charPositionInLine is explicitly
> set to -1.
>
> As noted earlier, I worked around this by using a helper method in my
> lexer:
>
> ANTLR3_UINT32 ANTLR3_INLINE char_position_in_line(pWikiTextLexer
> ctx)
> {
> ANTLR3_UINT32 pos = ctx->pLexer->getCharPositionInLine(ctx-
> >pLexer);
> return pos == -1 ? 0 : pos;
> }
>
> But I just discovered that this still won't work consistently on the
> first line of the input. This is because after scanning the first
> character on the first line the column number will be bumped up by 1
> only. This means that the column numbering is off by one for every
> character in the first line; all the other lines are correct.
>
> To illustrate, given input like the following:
>
> foobar
> foobar
>
> On the first line "f" will be at column -1, "o" will be at column 0,
> and so on for columns 1, 2, 3, 4.
>
> On the second line "f" will be at column 0, "o" at column 1, and so
> on for columns 2, 3, 4, 5.
>
> As this is inconsistent it could catch people out and so looks like a
> bug to me.
>
> SOLUTION:
>
> So to work around this bug the helper method now needs to be updated
> to look at the line number first in order to figure out whether
> adjusting the column number is necessary or not. Alternatively, the
> following patch can be applied to the runtime (antlr3inputstream.c):
>
> 166c166
> < input->charPositionInLine = -1;
> ---
> > input->charPositionInLine = 0;
>
> Cheers,
> Wincent
More information about the antlr-interest
mailing list