[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.
Gavin Lambert
antlr at mirality.co.nz
Wed Aug 13 13:48:35 PDT 2008
[Ok, I'm mostly responding to Kay here, but I had
to do it indirectly since I didn't get the original message.]
At 08:22 14/08/2008, Foust wrote:
>>Kay Röpke wrote:
>> I'm just saying that adding a column and the
tab-width handling
>> doesn't make that much sense, because it's not something you
>> generally need. If you do need it, it's almost trivial to add.
You need it to produce any kind of useful error
message when the input file contains tabs. I
guess you could work around this by
pre-converting all tabs to spaces before passing
it to ANTLR, but that's effectively a whole
'nother lexing step, which seems like a
waste. And the error message would *still* be
misleading, since it reports the zero-based
character offset as if it were a one-based column number.
>> If I talk about column 1, then yes, I mean
the first character.
>> I'm human after all.
>> But when I see charPosInLine, I think index (in c-speak).
That's fine, if you're dealing with the object
model. But often you're not -- the token
attribute, for example, is simply called
'$X.position', which could be read either
way. And the error messages simply dump the
charPosInLine *as if it were a column*. _That_
is what I object to, not the zero-based-ness of
the charPosInLine (I agree that this makes the most sense).
>> Note: I'm not talking about solving the tab problem, but
>> displaying a short portion of the input (whether charstream
>> or tokenstream) with an indicator where the offending
>> char/token was. That should make it easy to find the error,
>> even if we can't provide column-accurate position
>> info out of the box.
While I think this is an excellent idea... how
exactly are you going to position the indicator
if you don't know the column position? You can't
rely on outputting tabs for positioning because
the tabs in the input stream and the tabs on the
console/output stream may not have the same width.
And I *still* haven't heard a convincing argument
for why column tracking can't be implemented
correctly out of the box, at least for input
sources that use constant-spacing tabs (which is
probably at least 90% of cases). The extra
per-token overhead seems trivial and it'd be much
simpler to track the column as it's parsed rather
than after the fact.
>Yes. You're right. Cut to the chase and just give the offending
>input, rather than make the user go search for it.
You still need to give line/column information,
so that IDEs can jump straight to the location of
the error themselves. (I'm assuming here that
the IDE is separate from ANTLR and can't access
its internal structures -- and most IDEs expect
errors to have a line:column format.)
More information about the antlr-interest
mailing list