[antlr-interest] Error recovery contortion
Terence Parr
parrt at cs.usfca.edu
Tue Dec 7 11:06:29 PST 2004
On Dec 2, 2004, at 12:01 PM, Paul J. Lucas wrote:
> Some background: the language I'm parsing is XQuery
> <http://www.w3.org/TR/xquery/> that, among other annoyances, is
> keyword-free. This makes recovery much harder because the lexer
> is stateful.
>
> As a first pass, I want to recover from syntax errors inside
> function declarations only. I can't simply use ANTLR's default
> error-recovery mechanism because I have to sync to a known
> token and reset the lexer's state. (ANTLR's default mechanism
> sync to one of the tokens in the follow set.) Function
> declarations in XQuery end with a ';' so, upon error, I throw
> away all tokens until I get to that. (I will hopefully be able
> to imporove this in the furture, but for now, it's good
> enough.)
Hi Paul,
Well, that "sync to semicolon" is a normal thing to do
(consumeUntil(SEMI)); alas, resetting lexical state might be a hassle,
though a method call to the lexer can easily be initiated.
> Setting defaultErrorHandler=false makes this work fine for
> syntax errors inside function declarations. I have something
> like this:
>
> functionDeclBody
> : enclosedExpr
> ;
> exception
> catch [ RecognitionException re ] {
> ## = #([ERROR,"ERROR"]);
> recover( re );
> }
>
> where recover() is my own, working recovery function. Hence,
> if an exception is thrown during enclosedExpr, it will be
> caught and recovered from and the generated AST is just fine.
> So far, so good.
Yep. :)
> But, if there's a syntax error *outside* a function
> declaration, the generated AST is trashed.
Oh, right because the error makes the invoking rule exit without
collecting the tree associated with the functions.
> Another requirement
> is that I keep the generated AST up to the point of the error
> outside a function declaration. As I've mentioned previously,
> the reason the AST gets trashed is because when an exception is
> thrown and there's no recovery in place, the AST isn't stitched
> together because it's done only upon successful function
> *return*: stack unwinding upon an exception bypassed normal
> function returns.
Yeah, I never worried about this problem before. Nasty.
> OK, so I tried setting defaultErrorHandler=true. This makes
> the generated AST be fine for errors outside of function
> declarations, but now the problem is that ANTLR recovers all by
> itself while doing enclosedExpr and functionDeclBody above is
> never given the opportunity to catch the exception and do the
> correct recovery. Hence, this breaks my recovery mechanism.
Are you sure? The following code generates precisely the same output
for func() with and without the "defaultErrorHandler=false;".
class T extends Parser;
options {
buildAST=true;
defaultErrorHandler=true;
}
prog : START func (COMMA func)* STOP ;
func : ID STUFF SEMI ;
exception
catch [ RecognitionException re ] {
## = #([ERROR,"ERROR"]);
recover( re );
}
Could you add an exception handle to each rule? Labor-intensive, but
precise, right?
> Sigh...
You sound like John Mitchell ;)
> So I looked at the ANTLR-generated Java code: it calls
> reportError() during its own error recovery. So what I need to
> do is continue to allow it to recover as normal (so my AST is
> preserved) *except* when the current call stack contains
> functionDeclBody, i.e., if reportError() is called "through"
> functionDeclBody, do my own recovery instead. OK, so set a flag
> in my parser:
>
> functionDeclBody
> {
> m_recoverable = true;
> }
> : enclosedExpr
> {
> m_recoverable = false;
> }
> ;
> exception
> catch [ RecognitionException re ] {
> ## = #([ERROR,"ERROR"]);
> recover( re );
> }
>
> and override reportError() like:
>
> public void reportError( RecognitionException re ) {
> final boolean recoverable = m_recoverable;
> m_recoverable = false;
> if ( recoverable )
> throw new ANTLR_WorkaroundException( re );
>
> // ... other recovery not relevant to this post ...
> }
An interesting approach, though I'm not sure you need it yet.
> i.e., if I'm doing my own recovery, I want any exception caught
> by ANTLR's recovery mechanism to be rethrown so the stack
> unwinds back up to functionDeclBody. One slight problem:
> reportError() isn't declared to throw any exception. Hence, I
> created the ANTLR_WorkaroundException class that extends
> RuntimeException to work around this annoyance.
Only an annoyance, mein herr, because you want reportError to do more
than it was meant to ;)
> OK, I'm pretty sure this all works, but it requires a lot of
> programming contortion, more than should be necessary.
Agreed.
> A suggestion is to change the default exception-handling code
> emitted to something like:
>
> catch ( RecognitionException ex ) {
> reportError( ex );
> recover( ex, _someTokenSet );
> }
>
> where recover() is a new method in Parser.java that, by
> default, is:
>
> void recover( RecognotionException ex, BitSet set )
> throws TokenStreamException
> {
> consume();
> consume( set );
> }
>
> This will allow a user to override what recovery does without
> having to use the hack of stuffing such code into reportError()
> (where it doesn't conceptually belong).
Yes, this is reasonable. We should change:
catch (RecognitionException ex) {
reportError(ex);
consume();
consumeUntil(_tokenSet_0);
}
to
catch (RecognitionException ex) {
reportError(ex);
recover(ex, _tokenSet_0);
}
For 2.7.5. Anybody got a problem with this? I would be just factoring
out the code into a method.
Note that for 3.0, somebody suggested that we have a template with all
the appropriate goodies that people can use like a macro, though a
method is similar. The template approach is nicer in some sense
because it could access local variables/parameters from the rule
method.
Anyway, my $0.02.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list