[antlr-interest] How to skip to end of line on error?

Sun Nov 29 10:54:52 PST 2009

This is something I need to write a Wiki article on as it comes up a lot and the solutions are not obvious. Basically, you need to prevent the parsing loop from dropping all the way out of the current rule because it finds an error (in your case within the article rule.) You will also find this much easier if rather than trying to accommodate files without a terminating NL, you just always add an NL to the incoming input, then you will not need the trailing article NL? But can have (article NL)* EOF.

So, when an error occurs in the article rule, it will drop out of that rule, but may not resync, so you want to force the resync to the NL when the article rule returns. This is pretty simple, but requires quite a bit of 'inside' knowledge of the ANTLR behavior. What you need to do is create a rule with just the epsilon (nothing) alt, and invoke it directly before the article call but more especially directly after it:

articleList

    : reSync  (article reSync NL)* EOF // Assuming that this is where EOF should be

    ;

Next, in your resSync rule, you want to resync to the follow set that will now be on the stack, which is actually the same as the first set for the following rule (because ruleSync is empty). Here we know that the followSet will only be NL, so you could hard code that, but this is a generally good technique to know, so let's use it generically). If you don't really understand this, don't worry too much, you can just copy the code and empty rule and it will work:

reSync

@init

{

    syncToFirstSet(); // Consume tokens until LA(1) is in the followset at the top of the followSet stack

}

: // Deliberately match nothing, but will be invoked anyway

;

Then in your superClass (best) or @members, implement the syncToFirstSet method:

    protected void syncToFirstSet ()

    {

        // Compute the followset that is in context where ever we are in the

        // rule chain/stack

        //

         BitSet follow = state.following[state._fsp]; //computeContextSensitiveRuleFOLLOW();

         syncToFirstSet (follow);

    }

    protected void syncToFirstSet (BitSet follow)

    {

        int mark = -1;

        try {

            mark = input.mark();

            // Consume all tokens in the stream until we find a member of the follow

            // set, which means the next production should be guarenteed to be happy.

            //

            while (! follow.member(input.LA(1)) ) {

                if  (input.LA(1) == Token.EOF) {

                    // Looks like we didn't find anything at all that can help us here

                    // so we need to rewind to where we were and let normal error handling

                    // bail out.

                    //

                    input.rewind();

                    mark = -1;

                    return;

                }

                input.consume();

            }

        } catch (Exception e) {

          // Just ignore any errors here, we will just let the recognizer

          // try to resync as normal - something must be very screwed.

          //

        }

        finally {

            // Always release the mark we took

            //

            if  (mark != -1) {

                input.release(mark);

            }

        }

    }

And that's it. Every time you mention reSync in a rule, it will resync the input to a member of the current followSet, which will be the first set of the rule that follows reSync in the current production and you will therefore not drop out of the parsing loop, but reenter your article rule. The first invocation is just in case there is junk before the first article starts (depending on how this rule is invoked, you may need to resync before the articleList rule).

I will make a wiki article of this soon as it is commonly required and not particularly obvious.

Jim

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Rick Schumeyer
Sent: Saturday, November 28, 2009 5:20 PM
To: antlr-interest
Subject: [antlr-interest] How to skip to end of line on error?

I've read the section on error reporting and recovery from "the book" but still can't figure out what may be a simple problem.

I want to parse a file that consists of bibliographic entries.  Each entry is on one line (so each record ends with \n).

If a record does not match, I just want to print an error message, and skip to the end of line and start again with the next record.

If I understand chapter 10 correctly, then '\n' should be in the resynchronization set, and the parser will consume tokens until it finds one.

This isn't happening.  Once I get an error, the parser never recovers.  I get a bunch of NoViableAlt exceptions.  I'm hoping someone can explain what I'm doing wrong.

Here is a sample input file.  The 1st and 3rd lines are ok, the 2nd line is an error.

Name. "Title," Periodical, 2005, v41(3,Oct), 217-240.
Name. "Title," Periodical, 2005, v41(3,Oct), Article 2.
Name. "Title," Periodical, 2005, v41(3,Oct), 217-240.

Here is the grammar:

grammar Periodical;

article_list 
    :    (article NL)* article NL?
    ;

article
    :    a=authors PERIOD SPACE QUOTE t=title COMMA QUOTE SPACE j=journal COMMA SPACE y=year COMMA SPACE v=volume COMMA SPACE p=pages PERIOD SPACE*
    ;

authors    :    (~QUOTE)+;

title    :    (~QUOTE)+;

journal    :    (LETTER|SPACE|COMMA|DASH)+;

volume    :    (LETTER|DIGIT)+
    |    (LETTER|DIGIT)+ '(' (LETTER|DIGIT|SLASH|COMMA)+ ')' 
    ;

year    :    DIGIT DIGIT DIGIT DIGIT;

pages    :    DIGIT+ DASH DIGIT+;

PERIOD    :    '.';
QUOTE    :    '"';
COMMA    :    ',';
SPACE    :    ' ';
DIGIT    :    '0'..'9';
LETTER  :    ('a'..'z')|('A'..'Z');
DASH    :    '-';
SLASH    :    '/';
NL    :    '\r'? '\n';

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091129/387efe09/attachment.html