[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.
Gavin Lambert
antlr at mirality.co.nz
Wed Aug 13 13:19:07 PDT 2008
At 06:45 14/08/2008, Loring Craymer wrote:
>As far as tabs go, if it matters, it makes more sense to track
tab
>count and position in line; the user can do that by having a
>TAB : '\t' { tabs++; } ;
>rule (or something similar; I don't use ANTLR 3 action syntax)
and
>supporting a column() method that looks like
>int column() {
> return charPositionInLine - tabs + tabs * tabsize;
>}
>to the AST node type.
No, that wouldn't work.
With a tab size of 4 characters:
TAB TAB SPACE X => X at column 10
TAB SPACE TAB X => X at column 9
SPACE TAB SPACE TAB X => X at column 9
The order matters; the *only* way to work this out after the fact
is to examine each individual character between the start of line
and the desired character position. You can only do this if you
know the absolute position of the start of the line in the
character stream (or you can at least seek back to where it would
have been from some other known point) -- and if you still have
access to the character stream!
That's not a big deal for the lexer, but by the time you're in the
parser or tree parser you can't always get that information any
more. It seems ludicrous to me that this information is not
available when it's so critical to reporting decent error messages
to the user. (Not even the default error handler can get it right
at present.)
You *could* hack this up by extending the token type and adding
some special handling to the WS rule for tabs, yes. (Similar in a
way to how v2 required you to tell it where the newlines
were.) But this just seems like such a universally useful thing
that it really belongs as standard.
More information about the antlr-interest
mailing list