[antlr-interest] Improving mismatched token recovery on first token of sub-rule

Balazs Javor bjavor at gmail.com
Wed Oct 28 08:56:34 PDT 2009


This is my first adventure with ANTLR and I'm currently trying to wrap
my head around how to improve error recovery for various cases so that
the affected "unparsed error area" is as small as possible. (I'm
trying to develop a parser for an IDE where the text goes though all
sorts of stages of invalidness...)

I've run into the following problem:

Here's a simplified example document:

<document>
 <list>
   <item>value</item>
   tem>value</item>
   <item>value</item>
 </list>
<document>

As you can see there is an error on the the second "item" line.

Relevant rules from the grammar file may look like this (slightly
simplified):

DOCUMENT_START : '<document>';
DOCUMENT_END : '</document>';
LIST_START : '<list>';
LIST_END : '</list>';
ITEM_START : '<item>';
ITEM_END : '</item>';
VALUE: [a-z]*;

document: DOCUMENT_START list DOCUMENT_END;
list: LIST_START item+ LIST_END;
item: ITEM_START VALUE ITEM_END;

Now here's my problem:

Due to the error on the second ITEM_START token the lexer will simply omit
it.
The parser will then throw a MismatchedTokenException in list() and
recover after LIST_END.
Note that since we did not enter yet the sub-rule for item() the
recovery will replace the entire LIST part of the resulting AST with
an error node. This is because the exception is thrown in list(),
which causes it to exit the loop that is responsible for looking for
additional ITEMs.

What I would like to happen though, is for it to "skip" just that one
ITEM line and continue parsing the rest of the ITEMs producing an AST
that results in a LIST node with two (instead of three) ITEM nodes...
Unfortunately even if I specify a custom exception handler for the
"list" rule to consume tokens until after the next ITEM_END only, it
will still not be able to resume the original loop from the list()
handler...

Sorry, if I don't express myself clear enough...

Are there any solutions for this type of situation? Or am I missing
something very basic here?

Many thanks for any suggestions in advance!

Balazs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091028/af772f03/attachment.html 


More information about the antlr-interest mailing list