[antlr-interest] C target: unhelpful error messages from the default error handler in trivial cases

Wed Jul 20 19:53:53 PDT 2011

The standard error handler can only do its best as a generic handler. When
your grammar is crap, and you feed it crap, then guess what?

Also, your questions are not "with all due respect" at all; you don't
understand what is going on, but would rather blame the generic error
handler than your lack of knowledge (which you will improve if you offer a
little more respect). The recovery mechanisms are the same for C as Java.

However, if you spend some more time reading, then you will know to use
real tokens and not inline 'int' and 'float'. You would also have read the
long article on error recovery techniques in the Wiki and then know why
you are dropping out of the loop. Read the C code a bit and you will see
where the missing and invalid come from and would say "ahhhhh".

I love people spouting off about how bad things are, when they have made
no effort to look in to the details. The type of exception is generated by
the ANTLR analysis and not the C runtime. Or perhaps you have spent all
the effort you can?

In short then, I cannot know how you want to report errors, so there are a
bunch of examples of finding out information. But the type of exception
depends on how you construct your grammar.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Vlad
> Sent: Wednesday, July 20, 2011 6:50 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] C target: unhelpful error messages from the
> default error handler in trivial cases
>
> Greetings,
>
> Like apparently many new ANTLR users, I've borrowed the implementation
> from the default displayRecognitionError() to implement my own version.
> Somewhat unfortunately, this version generates unhelpful/random errors
> in rather trivial cases. Here is a full example:
>
> grammar testerrors;
>
> options
> {
>     language='C';
> }
>
> NAME    :   ( 'a'..'z' | 'A'..'Z' | '0'..'9' )+ ;
> WS      :   ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN; } ;
>
> parse:
>     decl ( options { greedy = true; }: ',' decl )* ','? EOF
>     ;
>
> decl:
>     NAME ':' type
>     ;
>
> type:
>     'int' | 'float'
>     ;
>
> Feeding "A : badtype" into parse() results in:
>
> -memory-(1)  : error 10 : Unexpected token, at offset 3
>     near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0>
> Line:
> 1 LinePos:3]
>      : Missing <invalid>
>
> What puzzles me is where the <invalid> comes from. It would seem easy
> to compute that either 'int' or 'float' token was expected. In the
> stock error handler this comes from tokenNames[ex->expecting] evaluated
> for
> ex->expecting being 0. What change to the default implementation is
> necessary to make this work correctly?
>
> Similary, attempting to parse "A :" results in:
>
> -unknown source-(1)  : error 10 : Unexpected token, at offset -1
>     near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0>
> Line:
> 1 LinePos:1]
>      : Missing <invalid>
>
> Note how the source became "unknown" and the offset became -1. In the
> default handler this is determined by "streamName" as follows:
>
> if (ex->streamName == NULL)
> {
> if (((pANTLR3_COMMON_TOKEN)(ex->token))->type == ANTLR3_TOKEN_EOF) {
> ANTLR3_FPRINTF(stderr, "-end of input-("); } else {
> ANTLR3_FPRINTF(stderr, "-unknown source-("); } } else { ftext = ex-
> >streamName->to8(ex->streamName);
> ANTLR3_FPRINTF(stderr, "%s(", ftext->chars); }
>
> and it is frankly unexpected that a slightly different match error type
> should have this impact since it does not impact the branches taken
> here at all (that happens later in the function). Anyone trying to take
> this function as a blueprint for their own handler would conclude that
> ex->streamName is NULL in one case but not the other and that is set
> somewhere *outside* of displayRecognitionError(): the problem of fixing
> the default implementation begins to feel like it might snowball into
> patching the runtime itself.
>
> As the last example, trying to parse "A B" results in:
>
> -memory-(1)  : error 1 : Unexpected token, at offset 1
>     near [Index: 2 (Start: 15787098-Stop: 15787098) ='B', type<4> Line:
> 1 LinePos:1]
>      : syntax error...
>
> The start/stop indices are bogus, i.e. some uninitialized variables --
> on repeated parses they change randomly.
>
> My second question follows. Good error handling is a big selling point
> of ANTLR, but with all due respect it hardly seems so for the C target.
> Is there documentation available for all context relevant to handling
> main mismatch error conditions? I have scanned everything in the
> available examples download and there are no examples of customizing
> the error handler that I can find. Alternatively, could someone share a
> workable version of their own that might be a good learning example?
>
> Thank you,
> Vlad
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address