[antlr-interest] Disabling recovery during parsing
Curtis Clauson
NOSPAM at TheSnakePitDev.com
Thu Nov 8 14:42:51 PST 2007
The entire error recovery and reporting systems are not appropriately
documented, and your frustration is well understood.
Both the lexer and parser handle error recovery similarly, but with
different method calls.
The lexer will call
recover(re)
before it throws a RecognitionException, which consumes the unexpected
character. The top
nextToken()
method has a hard-coded exception handler that catches the
RecognitionException, reports the error, and calls recover(re) again (a
bug in my opinion).
To change how the lexer handles recovery, override
recover(...)
To change how the exception is handled, override
nextToken()
One important, and also undocumented, note: CommonTokenStream will lex
the *entire* stream into tokens on the first token fetch. This means the
lexer will process and display all errors before the parser processes
the first token. So much for context.
When the parser fails to match a token, it calls
mismatch(...)
which creates the exception object and then calls
recoverFromMismatchedToken(...)
This method looks ahead at the next token and if it matches, reports the
error, skips the unexpected token, and returns a successful match. If it
does not match, it calls
recoverFromMismatchedElement(...)
This method tests if the unexpected token could follow the expected
token. If so, it will report the error, and return a successful match
(acting as if the missing token were found). If not, the exception
object is finally thrown.
If you have not created your own rule exception handler in the grammar,
or configured the default exception handlers with @rulecatch {}, then
the exception will be caught by the default rule exception handler,
which will call
reportError(re)
recover(input, re)
The recover(...) method in the parser will try to consume tokens until
one is found that allows it to resynchronize and continue parsing the
rest of the tokens.
To change how recovery is handled in the parser, override
recoverFromMismatchedToken(...)
recoverFromMismatchedElement(...)
to change those strategies, or override
mismatch(...)
to change the whole before-exception response.
To change how recovery is done in response to the RecoveryException,
configure or provide a different either default or per-rule exception
handler in the grammar, or override
reportError(re)
recover(input, re)
in the parser.
I had to spelunk the source to find all of this, since these questions
never get answered. I has worked for me so far.
I hope that helps.
-- Curtis
Foolish Ewe wrote:
> Our experiences with ANTLR has been generally good, but we could use a bit of help here. We have a fairly simple language which we read in one command at a time. A command is passed in as a buffer with an ascii NULL ('\0') terminator character, which we treat as a "sentinel" token to try to better detect when an invalid suffix might be in the look ahead. Since we have an interactive scripting tool, and all commands are terminated by newlines, we have a "wrapper" which invokes the parser and catches exceptions due to syntax errors. However, for some suffix errors, the parser appears to capture the exception internally, and generates a warning message, rather than passing the exception out. So although the diagnostic messages which might look something like (the null terminator is not displayed):
>
> XXXX YYYYYY ZZZZ index-10;ge
> BR.recoverFromMismatchedToken line 1:25 mismatched input ';ge expecting ENDOFCOMMAND
>
> where XXXX, YYYYYY and ZZZZ are all valid keywords in the language, and the lexer has a production:
> // Lexer Rules
> ENDOFCOMMAND : '\000'; // A terminal NULL Character
>
> The ;ge suffix should not be there, and is correctly flagged as invalid, but the error recovery is more "forgiving" than we want in our application, we would prefer the parser to throw a RecognitionException which we handle in the wrapper for consistency reasons.
More information about the antlr-interest
mailing list