[antlr-interest] Continue parsing after an error

Luchesar Cekov luchesar.cekov at ontology-partners.com
Thu Jul 1 09:05:18 PDT 2010


Hi Jim,

You were right!!! Custom Syntax Error Recovery section 
http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery 
contains explanation of how to fix the problem.

After applying it I have manage to get the parser going even after an 
error in a rule.

Many thanks!!! I was beginning to think I won't be able to solve this 
one via standard means.

Best regards,
Luchesar

Jim Idle wrote:
> On Wed, 30 Jun 2010 10:48:39 -0700
>   Gordon Tyler <Gordon.Tyler at quest.com> wrote:
>   
>> I'm not very familiar with ANTLR's error recovery mechanisms, but I 
>> suspect that the generated code for the 'expressions' rule looks for 
>> a character that it recognizes as the start of an 'expression' rule 
>> before it calls into the 'expression' rule and when it doesn't find 
>> one in the second case, it exits out into the root rule, which then 
>> checks if the next token is EOF and fails.
>>     
>
> Please read the article on the wiki entitled "Custom error recovery" - 
> this will give you all the information you need.
>
> Jim
>
>   
>> But this is just speculation. Hopefully one of the more experienced 
>> ANTLRers can give you a better answer.
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org 
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>> Sent: June 30, 2010 1:35 PM
>> Cc: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Continue parsing after an error
>>
>> Hi Gordon,
>>
>> Thanks for the prompt response.
>> Adding OTHER as an alternative was what I tried to do in the 
>> beginning. 
>> Unfortunately my use case is a bit more complex. I have worked out a 
>> better example below.
>> In this example, the input string  [ax][kx][ax] is wrong (k is not 
>> allowed) but the grammar builds the full ast tree, so it recovers 
>>     
> >from 
>   
>> the error - it would generate three expression nodes the second of 
>> which 
>> contains a ErrorCommonToken inside as per 
>> recoverFromMismatchedToken().
>> The string [ax]sax][ax] on the other end, generates only the first 
>> bit 
>> of the tree, till the error.  - it generares only one expression 
>> node.
>>
>> I do not understand why I get this different behavior - the parser 
>> recovers if the error happens in the middle of a rule, but not if 
>> the 
>> error is at the beginning of a rule.
>>
>> Is this a problem in my grammar or it is just the way ANTLR works?
>>
>> Thanks,
>> Luchesar
>>
>> ================
>> grammar StartOfARuleFailTest;
>>
>> options {    output=AST;    ASTLabelType=CommonTree; }
>>
>> tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }
>>
>> @members {
>>    @Override
>>    protected Object recoverFromMismatchedToken(IntStream input, int 
>> ttype, BitSet follow)
>>            throws RecognitionException {
>>        MismatchedTokenException ex = new 
>> MismatchedTokenException(ttype, input);
>>        input.consume();
>>        return createErrorToken(ex, ttype);
>>    }
>>   
>>    public static ErrorCommonToken 
>> createErrorToken(RecognitionException 
>> ex, int ttype) {
>>        ErrorCommonToken errorCommonToken = new 
>> ErrorCommonToken(ex.token);
>>        errorCommonToken.setType(ttype);
>>       
>>        return errorCommonToken;
>>    }
>> }
>>
>> root : expressions  EOF -> ^(ROOT_TOKEN expressions) ;
>> expressions  : expression* -> ^(EXPRESSIONS expression*) ;
>> expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');
>>
>> OTHER   : . ;
>> ================
>>
>>
>> Gordon Tyler wrote:
>>     
>>> The grammar you have defined says, roughly:
>>>
>>> Parse any number of '[' or ']' until you reach EOF.
>>>
>>> It does not describe what to do if something other than '[' or ']' 
>>> are found before EOF is found.
>>>
>>> You have defined a token, OTHER, to match the other stuff, but your 
>>> parse rules do not reference OTHER. Perhaps something like this would 
>>> work:
>>>
>>> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: antlr-interest-bounces at antlr.org 
>>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>>> Sent: June 30, 2010 10:10 AM
>>> To: antlr-interest at antlr.org
>>> Cc: Valerio Malenchino
>>> Subject: [antlr-interest] Continue parsing after an error
>>>
>>> Dear ANTLR enthusiasts,
>>>
>>> I am struggling with a problem. The parser jumps to the end of file 
>>>       
>> >from 
>>     
>>> the middle of the document.
>>>
>>> The setup is as follow:
>>>     * I have two alternatives flowed by EOF
>>>     * during parse time in the middle of the document next token can 
>>> not 
>>> match either alternatives start
>>>
>>> This leads to parsing termination because the parser jumps to the 
>>> EndOfFile.
>>>
>>> A simple grammar the illustrates the problem is
>>>
>>> ===============
>>> tokens {ROOT_TOKEN;}
>>> root
>>>     : expressions EOF -> ^(ROOT_TOKEN expressions) ;
>>> expressions : ('[' | ']')* ;
>>> OTHER   : . ;
>>> ===============
>>>
>>> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and 
>>> the 
>>> first "s" and will try to recover as if the EOF was the next token.
>>> When looking at the generated Parser it looks like if there is no 
>>> viable 
>>> alternative in the top rule in this case "root" the parser will 
>>> behave 
>>> as if it reached the EOF and will skip the rest of the tokens.
>>>
>>> The result AST will contain only children up until the first illegal 
>>> token "s".
>>>
>>> I cannot see where my mistake is. It looks like the parser should 
>>> not do 
>>> that. Can you suggest a workaround for the problem?
>>>
>>> Thanks in advance,
>>> Luchesar
>>>   
>>>       
>> -- 
>>
>> Luchesar Cekov
>> Software Engineer
>> +44 (0) 207 239 4949
>> *Ontology Systems*
>> www.ontology.com <http://www.ontology.com/>
>>
>> 	
>>
>> award list of icons       
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> .
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>     
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>   

-- 

Luchesar Cekov
Software Engineer
+44 (0) 207 239 4949
*Ontology Systems*
www.ontology.com <http://www.ontology.com/>

	

award list of icons       

 

 

 

 

.

 



More information about the antlr-interest mailing list