[antlr-interest] Disabling recovery during parsing

Thu Nov 8 14:42:51 PST 2007

The entire error recovery and reporting systems are not appropriately 
documented, and your frustration is well understood.

Both the lexer and parser handle error recovery similarly, but with 
different method calls.

The lexer will call
   recover(re)
before it throws a RecognitionException, which consumes the unexpected 
character. The top
   nextToken()
method has a hard-coded exception handler that catches the 
RecognitionException, reports the error, and calls recover(re) again (a 
bug in my opinion).

To change how the lexer handles recovery, override
   recover(...)
To change how the exception is handled, override
   nextToken()

One important, and also undocumented, note: CommonTokenStream will lex 
the *entire* stream into tokens on the first token fetch. This means the 
lexer will process and display all errors before the parser processes 
the first token. So much for context.

When the parser fails to match a token, it calls
   mismatch(...)
which creates the exception object and then calls
   recoverFromMismatchedToken(...)
This method looks ahead at the next token and if it matches, reports the 
error, skips the unexpected token, and returns a successful match. If it 
does not match, it calls
   recoverFromMismatchedElement(...)
This method tests if the unexpected token could follow the expected 
token. If so, it will report the error, and return a successful match 
(acting as if the missing token were found). If not, the exception 
object is finally thrown.

If you have not created your own rule exception handler in the grammar, 
or configured the default exception handlers with @rulecatch {}, then 
the exception will be caught by the default rule exception handler, 
which will call
   reportError(re)
   recover(input, re)

The recover(...) method in the parser will try to consume tokens until 
one is found that allows it to resynchronize and continue parsing the 
rest of the tokens.

To change how recovery is handled in the parser, override
   recoverFromMismatchedToken(...)
   recoverFromMismatchedElement(...)
to change those strategies, or override
   mismatch(...)
to change the whole before-exception response.
To change how recovery is done in response to the RecoveryException, 
configure or provide a different either default or per-rule exception 
handler in the grammar, or override
   reportError(re)
   recover(input, re)
in the parser.

I had to spelunk the source to find all of this, since these questions 
never get answered. I has worked for me so far.

I hope that helps.
-- Curtis

Foolish Ewe wrote:
> Our experiences with ANTLR has been generally good, but we could use a bit of help here. We have a fairly simple language which we read in one command at a time. A command is passed in as a buffer with an ascii NULL ('\0') terminator character, which we treat as a "sentinel" token to try to better detect when an invalid suffix might be in the look ahead. Since we have an interactive scripting tool, and all commands are terminated by newlines, we have a "wrapper" which invokes the parser and catches exceptions due to syntax errors. However, for some suffix errors, the parser appears to capture the exception internally, and generates a warning message, rather than passing the exception out. So although the diagnostic messages which might look something like (the null terminator is not displayed):
> 
> XXXX YYYYYY ZZZZ index-10;ge
> BR.recoverFromMismatchedToken line 1:25 mismatched input ';ge expecting ENDOFCOMMAND
> 
> where XXXX, YYYYYY and ZZZZ are all valid keywords in the language, and the lexer has a production:
> // Lexer Rules
> ENDOFCOMMAND : '\000'; // A terminal NULL Character
> 
> The ;ge suffix should not be there, and is correctly flagged as invalid, but the error recovery is more "forgiving" than we want in our application, we would prefer the parser to throw a RecognitionException which we handle in the wrapper for consistency reasons.