[antlr-interest] error handling v3 style

Sun Dec 11 10:32:42 PST 2005

For v3, I'm going to allow the usual spec of exceptions on the end of  
rules:

a : A B
   | C D
   ;
exception [label]
catch [exceptionType exceptionVariable]
   { action }
catch ...
catch ...

For control freaks, the templates for code gen can be altered  
trivially (and from within the grammar file).  Now, wouldn't it be  
interesting if we had "error productions" sort of like yacc tries to  
fake.  The idea is to provide error alts that match common  
ungrammatical sentences:

a : A B
    | C D
    / B A {error("don't you mean A B?"); recover();}
    / A {error("don't you want a B with that?"); recover();}
    ;

where (I've randomly used / to mean error alt but we probably want  
something better and more obvious).  This means if any of the first  
two alts fail, then rewind and try to match one of the last two (with  
full backtracking turned on as the productions will be highly  
ambiguous often).

Now, that only matches what the erroneous productions look like and  
you have to manually do the recovery step.  Should we allow you to  
specify the recovery language?  This would be an interesting feature  
that let you recover with a grammar fragment not an action.  For  
example, you might want to skip until you see the outermost '}' of a  
method.  You could do this with

method
	: type ID ...
	;
	exception
		catch[RecognitionException e]
			( {level>0}? ('}' {level--;} | .) )*

So instead of an action, you provide a grammar fragment (here a tough  
one with context-sensitive matching).

Do we need a combination of matching error sequences and then  
sophisticated error recovery strategies?

Is that interesting to any of you folks out there building systems?

Does anybody use the paraphrase feature from v2?

ID
options {
   paraphrase = "an identifier";
}
   : ('a'..'z'|'A'..'Z'|'_')
     ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
   ;

It says "an identifier" instead of ID in error messages.

Ter