[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.

Gavin Lambert antlr at mirality.co.nz
Wed Aug 13 13:19:07 PDT 2008


At 06:45 14/08/2008, Loring Craymer wrote:
 >As far as tabs go, if it matters, it makes more sense to track 
tab
 >count and position in line; the user can do that by having a
 >TAB : '\t'  { tabs++; } ;
 >rule (or something similar; I don't use ANTLR 3 action syntax) 
and
 >supporting a column() method that looks like
 >int column() {
 >     return charPositionInLine - tabs + tabs * tabsize;
 >}
 >to the AST node type.

No, that wouldn't work.

With a tab size of 4 characters:
   TAB TAB SPACE X => X at column 10
   TAB SPACE TAB X => X at column 9
   SPACE TAB SPACE TAB X => X at column 9

The order matters; the *only* way to work this out after the fact 
is to examine each individual character between the start of line 
and the desired character position.  You can only do this if you 
know the absolute position of the start of the line in the 
character stream (or you can at least seek back to where it would 
have been from some other known point) -- and if you still have 
access to the character stream!

That's not a big deal for the lexer, but by the time you're in the 
parser or tree parser you can't always get that information any 
more.  It seems ludicrous to me that this information is not 
available when it's so critical to reporting decent error messages 
to the user.  (Not even the default error handler can get it right 
at present.)

You *could* hack this up by extending the token type and adding 
some special handling to the WS rule for tabs, yes.  (Similar in a 
way to how v2 required you to tell it where the newlines 
were.)  But this just seems like such a universally useful thing 
that it really belongs as standard.



More information about the antlr-interest mailing list