[antlr-interest] 'Dude' error in v3.4 and possible bugs explained [was: on "crap" grammars]

Jim Idle jimi at temporal-wave.com
Thu Jul 21 12:37:39 PDT 2011


This was changed because the tool no longer generates those sets.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Justin Murray
> Sent: Thursday, July 21, 2011 12:28 PM
> To: Vlad
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] 'Dude' error in v3.4 and possible bugs
> explained [was: on "crap" grammars]
>
> I think that Vlad may be onto something here. From what I can tell from
> my generated grammar, this only affects ANTLR3_MISMATCHED_SET_EXCEPTION
> type exceptions. My grammar has several hundred parser rules, but only
> in 4 cases is a ANTLR3_MISMATCHED_SET_EXCEPTION generated. In all 4
> cases, the expectingSet is being set to NULL, and in no other cases is
> expectingSet being set to NULL. I agree that this would be improved if
> changed as Vlad described.
>
> It just so happens that the way I implemented my exception handling, I
> treat ANTLR3_MISMATCHED_SET_EXCEPTION the same as
> ANTLR3_RECOGNITION_EXCEPTION, and don't bother to display the
> expectingSet, so I never would have discovered this problem.
>
> Since I recently figured out how the C template works, I decided to
> take a peek. I found the following in antlr-3.4-complete-no-
> antlrv2.jar/org/antlr/codegen/templates/C/C.stg:
>
> <if(PARSER)>
> EXCEPTION->expectingSet = NULL;
> <! use following code to make it recover inline;
> EXCEPTION->expectingSet = &FOLLOW_set_in_<ruleName><elementIndex>;
> !>
> <endif>
>
> So it appears that this was done explicitly at some point. You could
> edit C.stg to uncomment the code above, and I imagine that it will
> generate the correct follow set pointer. Perhaps Jim knows why this is
> like this? This may be avoiding some other problems, so I don't know
> how safe of a change this would be.
>
> - Justin
>
> On 7/21/2011 2:45 PM, Vlad wrote:
>
> 	Previously I was on 3.2 runtime. It occurred to me to try 3.4
> released a day ago. To this end I've switched to 3.4-beta4 runtime as
> well. Using one of the testerrors.g grammars with non-inlined int/float
> tokens and parser generated by antlr-3.4-complete.jar I now get on
> input string "name : bad":
>
> 	<string>(1)  : error 4 : Unexpected token, at offset 6
> 	    near [Index: 4 (Start: 31458399-Stop: 31458401) ='bad',
> type<6> Line: 1 LinePos:6]
> 	     : unexpected input...
> 	  expected one of : Actually dude, we didn't seem to be expecting
> anything here, or at least
> 	I could not work out what I was expecting, like so many of us
> these days!
>
> 	(this required switching to antlr3StringStreamNew() from
> antlr3NewAsciiStringInPlaceStream() as was posted by Jim here:
> http://groups.google.com/group/il-antlr-
> interest/browse_thread/thread/981a79239e352c89 and as is mentioned
> within that thread the last argument can't be NULL to avoid a
> segfault).
>
> 	So, this is better because at least the offending token is
> identified correctly. The reason the expected set is still not
> identified correctly (the 'Dude' part) is because the generated error
> path for the 'type' non-terminal always sets the exception's
> expectingSet to NULL:
>
> 	        {
> 	            if ( ((LA(1) >= AT_FLOAT_) && (LA(1) <= AT_INT_)) )
> 	            {
> 	                CONSUME();
> 	                PERRORRECOVERY=ANTLR3_FALSE;
> 	            }
> 	            else
> 	            {
> 	                CONSTRUCTEX();
> 	                EXCEPTION->type         =
> ANTLR3_MISMATCHED_SET_EXCEPTION;
> 	                EXCEPTION->name         = (void
> *)ANTLR3_MISMATCHED_SET_NAME;
> 	                EXCEPTION->expectingSet = NULL; // <--- ????
>
> 	                goto ruletypeEx;
> 	            }
>
>
> 	        }
>
> 	I might be called names again, but I'd say this error handling
> does not look correct because the rule knows exactly what token set it
> expects right here but then goes ahead and ignores that info for the
> purposes of generating exception info (what's the point in indicating
> ANTLR3_MISMATCHED_SET_NAME if that set is always set to NULL).
>
> 	Examining the generated parser code, I in fact see what appears to
> be a correct set that would be FOLLOW(':'): it has bits set for
> AT_FLOAT_ and AT_INT_ and is FOLLOWPUSH()ed before the rule is entered.
>
> 	By manually doctoring the parser code to set  EXCEPTION-
> >expectingSet to point to this FOLLOW set, I get rid of the 'Dude'
> message but hit on another bug in displayRecognitionError() that prints
> the wrong two token names:
>
> 	<string>(1)  : error 4 : Unexpected token, at offset 6
> 	    near [Index: 4 (Start: 13845599-Stop: 13845601) ='bad',
> type<6> Line: 1 LinePos:6]
> 	     : unexpected input...
> 	  expected one of : <EOR>, <DOWN>
>
> 	Looking at the stock displayRecognitionError() code, it is clear
> that the loop over the set bits is not correct (the TODO is right).
> Fixing it by adding errBits->isMember(errBits, bit):
>
> 	for (bit = 1; bit < numbits && count < 8 && count < size; bit++)
> 	{
> 	// TODO: This doesn;t look right - should be asking if the bit is
> set!!
> 	//
> 	if  (errBits->isMember(errBits, bit) && tokenNames[bit]) // <---
> ???? was missing bitset member check
> 	{
> 	ANTLR3_FPRINTF(stderr, "%s%s", count > 0 ? ", " : "",
> tokenNames[bit]);
> 	count++;
> 	}
> 	}
>
> 	finally gets me the error message that makes sense:
>
> 	<string>(1)  : error 4 : Unexpected token, at offset 6
> 	    near [Index: 4 (Start: 30442591-Stop: 30442593) ='bad',
> type<6> Line: 1 LinePos:6]
> 	     : unexpected input...
> 	  expected one of : AT_FLOAT_, AT_INT_
>
>
> 	"Crap" grammars, I hear somebody said? Hmm, I don't think so...
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list