[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.

Wed Aug 13 13:48:35 PDT 2008

[Ok, I'm mostly responding to Kay here, but I had 
to do it indirectly since I didn't get the original message.]

At 08:22 14/08/2008, Foust wrote:
 >>Kay Röpke wrote:
 >> I'm just saying that adding a column and the 
tab-width handling
 >> doesn't make that much sense, because it's not something you
 >> generally need. If you do need it, it's almost trivial to add.

You need it to produce any kind of useful error 
message when the input file contains tabs.  I 
guess you could work around this by 
pre-converting all tabs to spaces before passing 
it to ANTLR, but that's effectively a whole 
'nother lexing step, which seems like a 
waste.  And the error message would *still* be 
misleading, since it reports the zero-based 
character offset as if it were a one-based column number.

 >> If I talk about column 1, then yes, I mean 
the first character.
 >> I'm human after all.
 >> But when I see charPosInLine, I think index (in c-speak).

That's fine, if you're dealing with the object 
model.  But often you're not -- the token 
attribute, for example, is simply called 
'$X.position', which could be read either 
way.  And the error messages simply dump the 
charPosInLine *as if it were a column*.  _That_ 
is what I object to, not the zero-based-ness of 
the charPosInLine (I agree that this makes the most sense).

 >> Note: I'm not talking about solving the tab problem, but
 >> displaying a short portion of the input (whether charstream
 >> or tokenstream) with an indicator where the offending
 >> char/token was. That should make it easy to find the error,
 >> even if we can't provide column-accurate position
 >> info out of the box.

While I think this is an excellent idea... how 
exactly are you going to position the indicator 
if you don't know the column position?  You can't 
rely on outputting tabs for positioning because 
the tabs in the input stream and the tabs on the 
console/output stream may not have the same width.

And I *still* haven't heard a convincing argument 
for why column tracking can't be implemented 
correctly out of the box, at least for input 
sources that use constant-spacing tabs (which is 
probably at least 90% of cases).  The extra 
per-token overhead seems trivial and it'd be much 
simpler to track the column as it's parsed rather 
than after the fact.

 >Yes. You're right. Cut to the chase and just give the offending
 >input, rather than make the user go search for it.

You still need to give line/column information, 
so that IDEs can jump straight to the location of 
the error themselves.  (I'm assuming here that 
the IDE is separate from ANTLR and can't access 
its internal structures -- and most IDEs expect 
errors to have a line:column format.)