[antlr-interest] Continue parsing after an error

Thu Jul 1 05:49:12 PDT 2010

On Wed, 30 Jun 2010 10:48:39 -0700
  Gordon Tyler <Gordon.Tyler at quest.com> wrote:
> I'm not very familiar with ANTLR's error recovery mechanisms, but I 
>suspect that the generated code for the 'expressions' rule looks for 
>a character that it recognizes as the start of an 'expression' rule 
>before it calls into the 'expression' rule and when it doesn't find 
>one in the second case, it exits out into the root rule, which then 
>checks if the next token is EOF and fails.

Please read the article on the wiki entitled "Custom error recovery" - 
this will give you all the information you need.

Jim

> 
> But this is just speculation. Hopefully one of the more experienced 
>ANTLRers can give you a better answer.
> 
> -----Original Message-----
>From: antlr-interest-bounces at antlr.org 
>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
> Sent: June 30, 2010 1:35 PM
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Continue parsing after an error
> 
> Hi Gordon,
> 
> Thanks for the prompt response.
> Adding OTHER as an alternative was what I tried to do in the 
>beginning. 
> Unfortunately my use case is a bit more complex. I have worked out a 
> better example below.
> In this example, the input string  [ax][kx][ax] is wrong (k is not 
> allowed) but the grammar builds the full ast tree, so it recovers 
>from 
> the error - it would generate three expression nodes the second of 
>which 
> contains a ErrorCommonToken inside as per 
>recoverFromMismatchedToken().
> The string [ax]sax][ax] on the other end, generates only the first 
>bit 
> of the tree, till the error.  - it generares only one expression 
>node.
> 
> I do not understand why I get this different behavior - the parser 
> recovers if the error happens in the middle of a rule, but not if 
>the 
> error is at the beginning of a rule.
> 
> Is this a problem in my grammar or it is just the way ANTLR works?
> 
> Thanks,
> Luchesar
> 
> ================
> grammar StartOfARuleFailTest;
> 
> options {    output=AST;    ASTLabelType=CommonTree; }
> 
> tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }
> 
> @members {
>    @Override
>    protected Object recoverFromMismatchedToken(IntStream input, int 
> ttype, BitSet follow)
>            throws RecognitionException {
>        MismatchedTokenException ex = new 
> MismatchedTokenException(ttype, input);
>        input.consume();
>        return createErrorToken(ex, ttype);
>    }
>   
>    public static ErrorCommonToken 
>createErrorToken(RecognitionException 
> ex, int ttype) {
>        ErrorCommonToken errorCommonToken = new 
>ErrorCommonToken(ex.token);
>        errorCommonToken.setType(ttype);
>       
>        return errorCommonToken;
>    }
> }
> 
> root : expressions  EOF -> ^(ROOT_TOKEN expressions) ;
> expressions  : expression* -> ^(EXPRESSIONS expression*) ;
> expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');
> 
> OTHER   : . ;
> ================
> 
> 
> Gordon Tyler wrote:
>> The grammar you have defined says, roughly:
>>
>> Parse any number of '[' or ']' until you reach EOF.
>>
>> It does not describe what to do if something other than '[' or ']' 
>>are found before EOF is found.
>>
>> You have defined a token, OTHER, to match the other stuff, but your 
>>parse rules do not reference OTHER. Perhaps something like this would 
>>work:
>>
>> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>>
>>
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org 
>>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>> Sent: June 30, 2010 10:10 AM
>> To: antlr-interest at antlr.org
>> Cc: Valerio Malenchino
>> Subject: [antlr-interest] Continue parsing after an error
>>
>> Dear ANTLR enthusiasts,
>>
>> I am struggling with a problem. The parser jumps to the end of file 
>>from 
>> the middle of the document.
>>
>> The setup is as follow:
>>     * I have two alternatives flowed by EOF
>>     * during parse time in the middle of the document next token can 
>>not 
>> match either alternatives start
>>
>> This leads to parsing termination because the parser jumps to the 
>>EndOfFile.
>>
>> A simple grammar the illustrates the problem is
>>
>> ===============
>> tokens {ROOT_TOKEN;}
>> root
>>     : expressions EOF -> ^(ROOT_TOKEN expressions) ;
>> expressions : ('[' | ']')* ;
>> OTHER   : . ;
>> ===============
>>
>> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and 
>>the 
>> first "s" and will try to recover as if the EOF was the next token.
>> When looking at the generated Parser it looks like if there is no 
>>viable 
>> alternative in the top rule in this case "root" the parser will 
>>behave 
>> as if it reached the EOF and will skip the rest of the tokens.
>>
>> The result AST will contain only children up until the first illegal 
>> token "s".
>>
>> I cannot see where my mistake is. It looks like the parser should 
>>not do 
>> that. Can you suggest a workaround for the problem?
>>
>> Thanks in advance,
>> Luchesar
>>   
> 
> -- 
> 
> Luchesar Cekov
> Software Engineer
> +44 (0) 207 239 4949
> *Ontology Systems*
> www.ontology.com <http://www.ontology.com/>
> 
> 	
> 
> award list of icons       
> 
> 
> 
> 
> 
> 
> 
> 
> 
> .
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address