[antlr-interest] Error recovery contortion

Terence Parr parrt at cs.usfca.edu
Tue Dec 7 11:06:29 PST 2004



On Dec 2, 2004, at 12:01 PM, Paul J. Lucas wrote:
> 	Some background: the language I'm parsing is XQuery
> 	<http://www.w3.org/TR/xquery/> that, among other annoyances, is
> 	keyword-free.  This makes recovery much harder because the lexer
> 	is stateful.
>
> 	As a first pass, I want to recover from syntax errors inside
> 	function declarations only.  I can't simply use ANTLR's default
> 	error-recovery mechanism because I have to sync to a known
> 	token and reset the lexer's state.  (ANTLR's default mechanism
> 	sync to one of the tokens in the follow set.)  Function
> 	declarations in XQuery end with a ';' so, upon error, I throw
> 	away all tokens until I get to that.  (I will hopefully be able
> 	to imporove this in the furture, but for now, it's good
> 	enough.)

Hi Paul,

Well, that "sync to semicolon" is a normal thing to do 
(consumeUntil(SEMI)); alas, resetting lexical state might be a hassle, 
though a method call to the lexer can easily be initiated.

> 	Setting defaultErrorHandler=false makes this work fine for
> 	syntax errors inside function declarations.  I have something
> 	like this:
>
> 		functionDeclBody
> 		    : enclosedExpr
> 		    ;
> 		    exception
> 		    catch [ RecognitionException re ] {
> 		        ## = #([ERROR,"ERROR"]);
> 			recover( re );
> 		    }
>
> 	where recover() is my own, working recovery function.  Hence,
> 	if an exception is thrown during enclosedExpr, it will be
> 	caught and recovered from and the generated AST is just fine.
> 	So far, so good.

Yep. :)

> 	But, if there's a syntax error *outside* a function
> 	declaration, the generated AST is trashed.

Oh, right because the error makes the invoking rule exit without 
collecting the tree associated with the functions.

>  Another requirement
> 	is that I keep the generated AST up to the point of the error
> 	outside a function declaration.  As I've mentioned previously,
> 	the reason the AST gets trashed is because when an exception is
> 	thrown and there's no recovery in place, the AST isn't stitched
> 	together because it's done only upon successful function
> 	*return*: stack unwinding upon an exception bypassed normal
> 	function returns.

Yeah, I never worried about this problem before.  Nasty.

> 	OK, so I tried setting defaultErrorHandler=true.  This makes
> 	the generated AST be fine for errors outside of function
> 	declarations, but now the problem is that ANTLR recovers all by
> 	itself while doing enclosedExpr and functionDeclBody above is
> 	never given the opportunity to catch the exception and do the
> 	correct recovery.  Hence, this breaks my recovery mechanism.

Are you sure?  The following code generates precisely the same output 
for func() with and without the "defaultErrorHandler=false;".

class T extends Parser;

options {
         buildAST=true;
         defaultErrorHandler=true;
}

prog : START func (COMMA func)* STOP ;

func : ID STUFF SEMI ;
exception
catch [ RecognitionException re ] {
                     ## = #([ERROR,"ERROR"]);
                     recover( re );
     }

Could you add an exception handle to each rule?  Labor-intensive, but 
precise, right?

> 	Sigh...

You sound like John Mitchell ;)

> 	So I looked at the ANTLR-generated Java code: it calls
> 	reportError() during its own error recovery.  So what I need to
> 	do is continue to allow it to recover as normal (so my AST is
> 	preserved) *except* when the current call stack contains
> 	functionDeclBody, i.e., if reportError() is called "through"
> 	functionDeclBody, do my own recovery instead.  OK, so set a flag
> 	in my parser:
>
> 		functionDeclBody
> 		{
> 		    m_recoverable = true;
> 		}
> 		    : enclosedExpr
> 		        {
> 		            m_recoverable = false;
> 		        }
> 		    ;
> 		    exception
> 		    catch [ RecognitionException re ] {
> 		        ## = #([ERROR,"ERROR"]);
> 			recover( re );
> 		    }
>
> 	and override reportError() like:
>
> 		public void reportError( RecognitionException re ) {
> 		    final boolean recoverable = m_recoverable;
> 		    m_recoverable = false;
> 		    if ( recoverable )
> 		       throw new ANTLR_WorkaroundException( re );
>
> 		    // ... other recovery not relevant to this post ...
> 		}

An interesting approach, though I'm not sure you need it yet.

> 	i.e., if I'm doing my own recovery, I want any exception caught
> 	by ANTLR's recovery mechanism to be rethrown so the stack
> 	unwinds back up to functionDeclBody.  One slight problem:
> 	reportError() isn't declared to throw any exception.  Hence, I
> 	created the ANTLR_WorkaroundException class that extends
> 	RuntimeException to work around this annoyance.

Only an annoyance, mein herr, because you want reportError to do more 
than it was meant to ;)

> 	OK, I'm pretty sure this all works, but it requires a lot of
> 	programming contortion, more than should be necessary.

Agreed.

> 	A suggestion is to change the default exception-handling code
> 	emitted to something like:
>
> 		catch ( RecognitionException ex ) {
> 		    reportError( ex );
> 		    recover( ex, _someTokenSet );
> 		}
>
> 	where recover() is a new method in Parser.java that, by
> 	default, is:
>
> 		void recover( RecognotionException ex, BitSet set )
> 		    throws TokenStreamException
> 		{
> 		    consume();
> 		    consume( set );
> 		}
>
> 	This will allow a user to override what recovery does without
> 	having to use the hack of stuffing such code into reportError()
> 	(where it doesn't conceptually belong).

Yes, this is reasonable.  We should change:

		catch (RecognitionException ex) {
			reportError(ex);
			consume();
			consumeUntil(_tokenSet_0);
		}

to
		catch (RecognitionException ex) {
			reportError(ex);
			recover(ex, _tokenSet_0);
		}

For 2.7.5.  Anybody got a problem with this?  I would be just factoring 
out the code into a method.

Note that for 3.0, somebody suggested that we have a template with all 
the appropriate goodies that people can use like a macro, though a 
method is similar.  The template approach is nicer in some sense 
because it could access local variables/parameters from the rule 
method.

Anyway, my $0.02.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list