[antlr-interest] 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]

Vlad vlad at demoninsight.com
Thu Jul 21 11:45:37 PDT 2011


Previously I was on 3.2 runtime. It occurred to me to try 3.4 released a day
ago. To this end I've switched to 3.4-beta4 runtime as well. Using one of
the testerrors.g grammars with non-inlined int/float tokens and parser
generated by antlr-3.4-complete.jar I now get on input string "name : bad":

<string>(1)  : error 4 : Unexpected token, at offset 6
    near [Index: 4 (Start: 31458399-Stop: 31458401) ='bad', type<6> Line: 1
LinePos:6]
     : unexpected input...
  expected one of : Actually dude, we didn't seem to be expecting anything
here, or at least
I could not work out what I was expecting, like so many of us these days!

(this required switching to antlr3StringStreamNew()
from antlr3NewAsciiStringInPlaceStream() as was posted by Jim here:
http://groups.google.com/group/il-antlr-interest/browse_thread/thread/981a79239e352c89
and
as is mentioned within that thread the last argument can't be NULL to avoid
a segfault).

So, this is better because at least the offending token is identified
correctly. The reason the expected set is still not identified correctly
(the 'Dude' part) is because the generated error path for the 'type'
non-terminal always sets the exception's expectingSet to NULL:

        {
            if ( ((LA(1) >= AT_FLOAT_) && (LA(1) <= AT_INT_)) )
            {
                CONSUME();
                PERRORRECOVERY=ANTLR3_FALSE;
            }
            else
            {
                CONSTRUCTEX();
                EXCEPTION->type         = ANTLR3_MISMATCHED_SET_EXCEPTION;
                EXCEPTION->name         = (void
*)ANTLR3_MISMATCHED_SET_NAME;
                EXCEPTION->expectingSet = NULL; // <--- ????

                goto ruletypeEx;
            }


        }

I might be called names again, but I'd say this error handling does not look
correct because the rule knows exactly what token set it expects right here
but then goes ahead and ignores that info for the purposes of generating
exception info (what's the point in indicating ANTLR3_MISMATCHED_SET_NAME if
that set is always set to NULL).

Examining the generated parser code, I in fact see what appears to be a
correct set that would be FOLLOW(':'): it has bits set for AT_FLOAT_ and
AT_INT_ and is FOLLOWPUSH()ed before the rule is entered.

By manually doctoring the parser code to set  EXCEPTION->expectingSet to
point to this FOLLOW set, I get rid of the 'Dude' message but hit on another
bug in displayRecognitionError() that prints the wrong two token names:

<string>(1)  : error 4 : Unexpected token, at offset 6
    near [Index: 4 (Start: 13845599-Stop: 13845601) ='bad', type<6> Line: 1
LinePos:6]
     : unexpected input...
  expected one of : <EOR>, <DOWN>

Looking at the stock displayRecognitionError() code, it is clear that the
loop over the set bits is not correct (the TODO is right). Fixing it by
adding errBits->isMember(errBits, bit):

for (bit = 1; bit < numbits && count < 8 && count < size; bit++)
{
// TODO: This doesn;t look right - should be asking if the bit is set!!
//
if  (errBits->isMember(errBits, bit) && tokenNames[bit]) // <--- ???? was
missing bitset member check
{
ANTLR3_FPRINTF(stderr, "%s%s", count > 0 ? ", " : "", tokenNames[bit]);
count++;
}
}

finally gets me the error message that makes sense:

<string>(1)  : error 4 : Unexpected token, at offset 6
    near [Index: 4 (Start: 30442591-Stop: 30442593) ='bad', type<6> Line: 1
LinePos:6]
     : unexpected input...
  expected one of : AT_FLOAT_, AT_INT_


"Crap" grammars, I hear somebody said? Hmm, I don't think so...


More information about the antlr-interest mailing list