[antlr-interest] Continue parsing after an error

Gordon Tyler Gordon.Tyler at quest.com
Wed Jun 30 10:48:39 PDT 2010


I'm not very familiar with ANTLR's error recovery mechanisms, but I suspect that the generated code for the 'expressions' rule looks for a character that it recognizes as the start of an 'expression' rule before it calls into the 'expression' rule and when it doesn't find one in the second case, it exits out into the root rule, which then checks if the next token is EOF and fails.

But this is just speculation. Hopefully one of the more experienced ANTLRers can give you a better answer.

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
Sent: June 30, 2010 1:35 PM
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Continue parsing after an error

Hi Gordon,

Thanks for the prompt response.
Adding OTHER as an alternative was what I tried to do in the beginning. 
Unfortunately my use case is a bit more complex. I have worked out a 
better example below.
In this example, the input string  [ax][kx][ax] is wrong (k is not 
allowed) but the grammar builds the full ast tree, so it recovers from 
the error - it would generate three expression nodes the second of which 
contains a ErrorCommonToken inside as per recoverFromMismatchedToken().
The string [ax]sax][ax] on the other end, generates only the first bit 
of the tree, till the error.  - it generares only one expression node.

I do not understand why I get this different behavior - the parser 
recovers if the error happens in the middle of a rule, but not if the 
error is at the beginning of a rule.

Is this a problem in my grammar or it is just the way ANTLR works?

Thanks,
Luchesar

================
grammar StartOfARuleFailTest;

options {    output=AST;    ASTLabelType=CommonTree; }

tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }

@members {
    @Override
    protected Object recoverFromMismatchedToken(IntStream input, int 
ttype, BitSet follow)
            throws RecognitionException {
        MismatchedTokenException ex = new 
MismatchedTokenException(ttype, input);
        input.consume();
        return createErrorToken(ex, ttype);
    }
   
    public static ErrorCommonToken createErrorToken(RecognitionException 
ex, int ttype) {
        ErrorCommonToken errorCommonToken = new ErrorCommonToken(ex.token);
        errorCommonToken.setType(ttype);
       
        return errorCommonToken;
    }
}

root : expressions  EOF -> ^(ROOT_TOKEN expressions) ;
expressions  : expression* -> ^(EXPRESSIONS expression*) ;
expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');

OTHER   : . ;
================


Gordon Tyler wrote:
> The grammar you have defined says, roughly:
>
> Parse any number of '[' or ']' until you reach EOF.
>
> It does not describe what to do if something other than '[' or ']' are found before EOF is found.
>
> You have defined a token, OTHER, to match the other stuff, but your parse rules do not reference OTHER. Perhaps something like this would work:
>
> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
> Sent: June 30, 2010 10:10 AM
> To: antlr-interest at antlr.org
> Cc: Valerio Malenchino
> Subject: [antlr-interest] Continue parsing after an error
>
> Dear ANTLR enthusiasts,
>
> I am struggling with a problem. The parser jumps to the end of file from 
> the middle of the document.
>
> The setup is as follow:
>     * I have two alternatives flowed by EOF
>     * during parse time in the middle of the document next token can not 
> match either alternatives start
>
> This leads to parsing termination because the parser jumps to the EndOfFile.
>
> A simple grammar the illustrates the problem is
>
> ===============
> tokens {ROOT_TOKEN;}
> root
>     : expressions EOF -> ^(ROOT_TOKEN expressions) ;
> expressions : ('[' | ']')* ;
> OTHER   : . ;
> ===============
>
> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and the 
> first "s" and will try to recover as if the EOF was the next token.
> When looking at the generated Parser it looks like if there is no viable 
> alternative in the top rule in this case "root" the parser will behave 
> as if it reached the EOF and will skip the rest of the tokens.
>
> The result AST will contain only children up until the first illegal 
> token "s".
>
> I cannot see where my mistake is. It looks like the parser should not do 
> that. Can you suggest a workaround for the problem?
>
> Thanks in advance,
> Luchesar
>   

-- 

Luchesar Cekov
Software Engineer
+44 (0) 207 239 4949
*Ontology Systems*
www.ontology.com <http://www.ontology.com/>

	

award list of icons       

 

 

 

 

.

 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list