[antlr-interest] ANTLR 3.0.1: invalid character column in a mismatch character error message.

Wed Aug 13 02:22:46 PDT 2008

Hi!

On Aug 13, 2008, at 10:26 AM, Gavin Lambert wrote:

> At 10:51 13/08/2008, Jim Idle wrote:
>> Once you start adding all these traces then you find the lexers  
>> generate 3 tokens a minute. The base information is all there and I  
>> think FAQ #1 just needs to be: "Why you need your own error message  
>> printing function."
>
> Hardly.  One or two extra ints containing information that's  
> basically already known at token generation time?  I doubt that'd  
> leave a noticeable dent.  (Well, ok, I guess the stream position  
> might have to be a longlong, or fpos_t, or whatever.  Still.)

It adds up, as simple as that. The more you store, the greater your  
memory footprint is, the more pages it has to touch, the slower it  
gets. Especially if you are parsing huge input it makes a noticeable  
difference (and in most target languages the footprint of an int is  
not 4 or 8 bytes, it's much larger for all those managed languages).

> And even if you do implement your own error handling function -- why  
> force it to do all the work of scanning the characters on the line  
> looking for and expanding tabs just to get a column number, when the  
> lexer already had to pass those same tabs in order to generate the  
> error in the first place?

If you absolutely need the column information in terms of expanded  
tabs, then just create more than one whitespace token: WS_SPACE,  
WS_TAB, WS_NEWLINE and check the hidden channel when expanding it.  
That way you can easily adapt it to the tabwidth you (or the user  
wants to have).
I think the runtime should be minimal, because it's much easier to add  
functionality than to remove it - most ANTLR users are not keen on  
modifying their ANTLR version.

> At minimum there should be a function in the runtime you can call to  
> do this for you.  I don't see why each driver program needs to re- 
> invent the wheel.

I disagree here, for the above reasons. There's a good way to  
implement it today and it doesn't require much code at all. It's even  
nicely contained so you can reuse it for other projects.
Re-inventing: No one stops you from writing it once (for your  
preferred runtime, I guess) and then put it on the wiki or whereever.  
If it comes with a liberal/open license there's no need to re-invent  
anything. :)

cheers,
-k
-- 
Kay Röpke
http://classdump.org/