[antlr-interest] C target: unhelpful error messages from the default error handler in trivial cases

Vlad vlad at demoninsight.com
Wed Jul 20 18:49:39 PDT 2011


Greetings,

Like apparently many new ANTLR users, I've borrowed the implementation from
the default displayRecognitionError() to implement my own version. Somewhat
unfortunately, this version generates unhelpful/random errors in rather
trivial cases. Here is a full example:

grammar testerrors;

options
{
    language='C';
}

NAME    :   ( 'a'..'z' | 'A'..'Z' | '0'..'9' )+ ;
WS      :   ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN; } ;

parse:
    decl ( options { greedy = true; }: ',' decl )* ','? EOF
    ;

decl:
    NAME ':' type
    ;

type:
    'int' | 'float'
    ;

Feeding "A : badtype" into parse() results in:

-memory-(1)  : error 10 : Unexpected token, at offset 3
    near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line:
1 LinePos:3]
     : Missing <invalid>

What puzzles me is where the <invalid> comes from. It would seem easy to
compute that either 'int' or 'float' token was expected. In the stock error
handler this comes from tokenNames[ex->expecting] evaluated for
ex->expecting being 0. What change to the default implementation is
necessary to make this work correctly?

Similary, attempting to parse "A :" results in:

-unknown source-(1)  : error 10 : Unexpected token, at offset -1
    near [Index: 0 (Start: 0-Stop: 0) ='<missing <invalid>>', type<0> Line:
1 LinePos:1]
     : Missing <invalid>

Note how the source became "unknown" and the offset became -1. In the
default handler this is determined by "streamName" as follows:

if (ex->streamName == NULL)
{
if (((pANTLR3_COMMON_TOKEN)(ex->token))->type == ANTLR3_TOKEN_EOF)
{
ANTLR3_FPRINTF(stderr, "-end of input-(");
}
else
{
ANTLR3_FPRINTF(stderr, "-unknown source-(");
}
}
else
{
ftext = ex->streamName->to8(ex->streamName);
ANTLR3_FPRINTF(stderr, "%s(", ftext->chars);
}

and it is frankly unexpected that a slightly different match error type
should have this impact since it does not impact the branches taken here at
all (that happens later in the function). Anyone trying to take this
function as a blueprint for their own handler would conclude that
ex->streamName is NULL in one case but not the other and that is set
somewhere *outside* of displayRecognitionError(): the problem of fixing the
default implementation begins to feel like it might snowball into patching
the runtime itself.

As the last example, trying to parse "A B" results in:

-memory-(1)  : error 1 : Unexpected token, at offset 1
    near [Index: 2 (Start: 15787098-Stop: 15787098) ='B', type<4> Line: 1
LinePos:1]
     : syntax error...

The start/stop indices are bogus, i.e. some uninitialized variables -- on
repeated parses they change randomly.

My second question follows. Good error handling is a big selling point of
ANTLR, but with all due respect it hardly seems so for the C target. Is
there documentation available for all context relevant to handling main
mismatch error conditions? I have scanned everything in the available
examples download and there are no examples of customizing the error handler
that I can find. Alternatively, could someone share a workable version of
their own that might be a good learning example?

Thank you,
Vlad


More information about the antlr-interest mailing list