[antlr-interest] Bug in C runtime column handling

Jim Idle jimi at temporal-wave.com
Mon Jul 2 09:09:12 PDT 2007


It was -1 in the Java at some point and I think Ter changed it. I have
not had time to work out the implications of this because the input
stuff tests for -1 in places I think. I will get to it.

Jim



> -----Original Message-----
> From: Wincent Colaiuta [mailto:win at wincent.com]
> Sent: Monday, July 02, 2007 6:21 AM
> To: ANTLR Interest
> Cc: Jim Idle
> Subject: Bug in C runtime column handling
> 
> I posted about this issue previously (<http://www.antlr.org/
> pipermail/antlr-interest/2007-June/021331.html>) but while unit
> testing a parser built with the ANTLR 3 C target I discovered that
> the behaviour is a little more inconsistent than I had initially
> thought. CC'ing you directly on this one Jim seeing as I don't know
> if you saw my post on the topic last time.
> 
> BACKGROUND:
> 
> In lexer rules it can be handy to use a predicate to match certain
> tokens only when they appear in a certain column (often the first
> column). The C runtime provides a getCharPositionInLine() function
> that can be used to access the current column information.
> Unfortunately, as described in my previous post, the value returned
> by this function will be -1 at the start of the input, unlike the
> Java runtime where it is 0. This is because in the antlr3InputReset()
> function in antlr3inputstream.c the charPositionInLine is explicitly
> set to -1.
> 
> As noted earlier, I worked around this by using a helper method in my
> lexer:
> 
>    ANTLR3_UINT32 ANTLR3_INLINE char_position_in_line(pWikiTextLexer
> ctx)
>    {
>        ANTLR3_UINT32 pos = ctx->pLexer->getCharPositionInLine(ctx-
>  >pLexer);
>        return pos == -1 ? 0 : pos;
>    }
> 
> But I just discovered that this still won't work consistently on the
> first line of the input. This is because after scanning the first
> character on the first line the column number will be bumped up by 1
> only. This means that the column numbering is off by one for every
> character in the first line; all the other lines are correct.
> 
> To illustrate, given input like the following:
> 
> foobar
> foobar
> 
> On the first line "f" will be at column -1, "o" will be at column 0,
> and so on for columns 1, 2, 3, 4.
> 
> On the second line "f" will be at column 0, "o" at column 1, and so
> on for columns 2, 3, 4, 5.
> 
> As this is inconsistent it could catch people out and so looks like a
> bug to me.
> 
> SOLUTION:
> 
> So to work around this bug the helper method now needs to be updated
> to look at the line number first in order to figure out whether
> adjusting the column number is necessary or not. Alternatively, the
> following patch can be applied to the runtime (antlr3inputstream.c):
> 
> 166c166
> <     input->charPositionInLine = -1;
> ---
>  >     input->charPositionInLine = 0;
> 
> Cheers,
> Wincent



More information about the antlr-interest mailing list