[antlr-interest] Bug in C runtime column handling

Wincent Colaiuta win at wincent.com
Mon Jul 2 06:21:07 PDT 2007


I posted about this issue previously (<http://www.antlr.org/ 
pipermail/antlr-interest/2007-June/021331.html>) but while unit  
testing a parser built with the ANTLR 3 C target I discovered that  
the behaviour is a little more inconsistent than I had initially  
thought. CC'ing you directly on this one Jim seeing as I don't know  
if you saw my post on the topic last time.

BACKGROUND:

In lexer rules it can be handy to use a predicate to match certain  
tokens only when they appear in a certain column (often the first  
column). The C runtime provides a getCharPositionInLine() function  
that can be used to access the current column information.  
Unfortunately, as described in my previous post, the value returned  
by this function will be -1 at the start of the input, unlike the  
Java runtime where it is 0. This is because in the antlr3InputReset()  
function in antlr3inputstream.c the charPositionInLine is explicitly  
set to -1.

As noted earlier, I worked around this by using a helper method in my  
lexer:

   ANTLR3_UINT32 ANTLR3_INLINE char_position_in_line(pWikiTextLexer ctx)
   {
       ANTLR3_UINT32 pos = ctx->pLexer->getCharPositionInLine(ctx- 
 >pLexer);
       return pos == -1 ? 0 : pos;
   }

But I just discovered that this still won't work consistently on the  
first line of the input. This is because after scanning the first  
character on the first line the column number will be bumped up by 1  
only. This means that the column numbering is off by one for every  
character in the first line; all the other lines are correct.

To illustrate, given input like the following:

foobar
foobar

On the first line "f" will be at column -1, "o" will be at column 0,  
and so on for columns 1, 2, 3, 4.

On the second line "f" will be at column 0, "o" at column 1, and so  
on for columns 2, 3, 4, 5.

As this is inconsistent it could catch people out and so looks like a  
bug to me.

SOLUTION:

So to work around this bug the helper method now needs to be updated  
to look at the line number first in order to figure out whether  
adjusting the column number is necessary or not. Alternatively, the  
following patch can be applied to the runtime (antlr3inputstream.c):

166c166
<     input->charPositionInLine = -1;
---
 >     input->charPositionInLine = 0;

Cheers,
Wincent



More information about the antlr-interest mailing list