[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.

Wed Aug 13 13:59:01 PDT 2008

Hi!

On Aug 13, 2008, at 10:22 PM, Foust wrote:

>>> But most users probably think that column #1 means the first
>>> character, not
>>> the 2nd.
>>
>> If I talk about column 1, then yes, I mean the first character. I'm
>> human after all.
>> But when I see charPosInLine, I think index (in c-speak).
>
> Yes. Whereas vertical tabs are no longer used, the Antlr "line"  
> attribute is
> 1-based, but the horizontal coordinate, "charPosInLine" is 0-based  
> (for
> reasons you've described in detail). Maybe it would have been  
> clearer with a
> name like "charIndex".

yeah, maybe that's a better name for it. i guess we have to live with  
it now, but it's not that crucial, i think.

> Nevertheless, the question seems to be one of whether it is  
> worthwhile to
> handle tabs as a special case, and I hear you voting, "no."

right. for handling tabs, i think it's just not worth the effort,  
because essentially what we are talking about here is to _expand_ tabs  
to spaces, something i wouldn't get into.
As Gavin pointed out, it's not as simple as counting the tabs,  
multiplying that number by tabwidth and substracting the number of  
tabs, the actual column will depend on the order of spaces and tabs.
And since we are not in the business of creating an editor, but a  
parser generator, I think we should not touch that subject at all.

If fact, I strongly believe tabs to be supremely evil and they should  
be first up against the wall when the revolution comes ;)

Seriously though, ANTLR correctly reports the _character_ position  
(disregarding the 0 vs 1 debate for the moment), because a \t is _one_  
character. When you are dealing with text in any UI library I've seen,  
tabs are represented as one character in the underlying text storage,  
to avoid having you to deal with all this trouble of what the effect  
of tabs on the screen is. It's up to other layers to figure out the  
actual layout. We should do likewise.

I already see the next guy writing a syntax highlighter coming along  
and complain about ANTLR expanding tabs to spaces so that for input  
like "\tID" we report the start index of token ID as being 8 (or 9 if  
someone insists on charPosInLine to be 1-based), assuming that  
"standard tab width" is 8. If written in sloppy C that could easily  
crash his application, and in any other language it would at least  
cause an exception of some sort.
That's the fundamental reason I'm so strongly against handling tabs in  
any special way.

The grammar author is of course free to generate special whitespace  
tokens for different kind of whitespace in case he needs to somehow  
disambiguate them later on.

>> ANTLRWorks helps here, but sometimes I want to see it in the actual
>> output. Shouldn't be hard to add in any case.
>
> AntlrWorks has it's issues. It's difficult to rely on it, unless it  
> is being
> actively supported. (Are bugs being actively addressed in AntlrWorks?)

Yes, although Jean is on vaction, I hear ;)
If there is anything not working, please write an email to the list  
and someone will enter it into JIRA ( http://www.antlr.org:8888/browse/AW 
  ).
I have some local changes regarding improved composite grammar  
support, which we will sort out when he is back, for example.

> I agree with you that more descriptive error messages are needed and  
> would
> probably solve most issues without resorting to character counting,  
> anyway.

Yep. I think a different style of reporting and maybe some ANTLRWorks  
improvements in that area could help with these issues.

cheers,
-k
-- 
Kay Röpke
http://classdump.org/