[antlr-interest] Continue parsing after an error
Luchesar Cekov
luchesar.cekov at ontology-partners.com
Thu Jul 1 09:05:18 PDT 2010
Hi Jim,
You were right!!! Custom Syntax Error Recovery section
http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery
contains explanation of how to fix the problem.
After applying it I have manage to get the parser going even after an
error in a rule.
Many thanks!!! I was beginning to think I won't be able to solve this
one via standard means.
Best regards,
Luchesar
Jim Idle wrote:
> On Wed, 30 Jun 2010 10:48:39 -0700
> Gordon Tyler <Gordon.Tyler at quest.com> wrote:
>
>> I'm not very familiar with ANTLR's error recovery mechanisms, but I
>> suspect that the generated code for the 'expressions' rule looks for
>> a character that it recognizes as the start of an 'expression' rule
>> before it calls into the 'expression' rule and when it doesn't find
>> one in the second case, it exits out into the root rule, which then
>> checks if the next token is EOF and fails.
>>
>
> Please read the article on the wiki entitled "Custom error recovery" -
> this will give you all the information you need.
>
> Jim
>
>
>> But this is just speculation. Hopefully one of the more experienced
>> ANTLRers can give you a better answer.
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>> Sent: June 30, 2010 1:35 PM
>> Cc: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Continue parsing after an error
>>
>> Hi Gordon,
>>
>> Thanks for the prompt response.
>> Adding OTHER as an alternative was what I tried to do in the
>> beginning.
>> Unfortunately my use case is a bit more complex. I have worked out a
>> better example below.
>> In this example, the input string [ax][kx][ax] is wrong (k is not
>> allowed) but the grammar builds the full ast tree, so it recovers
>>
> >from
>
>> the error - it would generate three expression nodes the second of
>> which
>> contains a ErrorCommonToken inside as per
>> recoverFromMismatchedToken().
>> The string [ax]sax][ax] on the other end, generates only the first
>> bit
>> of the tree, till the error. - it generares only one expression
>> node.
>>
>> I do not understand why I get this different behavior - the parser
>> recovers if the error happens in the middle of a rule, but not if
>> the
>> error is at the beginning of a rule.
>>
>> Is this a problem in my grammar or it is just the way ANTLR works?
>>
>> Thanks,
>> Luchesar
>>
>> ================
>> grammar StartOfARuleFailTest;
>>
>> options { output=AST; ASTLabelType=CommonTree; }
>>
>> tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }
>>
>> @members {
>> @Override
>> protected Object recoverFromMismatchedToken(IntStream input, int
>> ttype, BitSet follow)
>> throws RecognitionException {
>> MismatchedTokenException ex = new
>> MismatchedTokenException(ttype, input);
>> input.consume();
>> return createErrorToken(ex, ttype);
>> }
>>
>> public static ErrorCommonToken
>> createErrorToken(RecognitionException
>> ex, int ttype) {
>> ErrorCommonToken errorCommonToken = new
>> ErrorCommonToken(ex.token);
>> errorCommonToken.setType(ttype);
>>
>> return errorCommonToken;
>> }
>> }
>>
>> root : expressions EOF -> ^(ROOT_TOKEN expressions) ;
>> expressions : expression* -> ^(EXPRESSIONS expression*) ;
>> expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');
>>
>> OTHER : . ;
>> ================
>>
>>
>> Gordon Tyler wrote:
>>
>>> The grammar you have defined says, roughly:
>>>
>>> Parse any number of '[' or ']' until you reach EOF.
>>>
>>> It does not describe what to do if something other than '[' or ']'
>>> are found before EOF is found.
>>>
>>> You have defined a token, OTHER, to match the other stuff, but your
>>> parse rules do not reference OTHER. Perhaps something like this would
>>> work:
>>>
>>> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: antlr-interest-bounces at antlr.org
>>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>>> Sent: June 30, 2010 10:10 AM
>>> To: antlr-interest at antlr.org
>>> Cc: Valerio Malenchino
>>> Subject: [antlr-interest] Continue parsing after an error
>>>
>>> Dear ANTLR enthusiasts,
>>>
>>> I am struggling with a problem. The parser jumps to the end of file
>>>
>> >from
>>
>>> the middle of the document.
>>>
>>> The setup is as follow:
>>> * I have two alternatives flowed by EOF
>>> * during parse time in the middle of the document next token can
>>> not
>>> match either alternatives start
>>>
>>> This leads to parsing termination because the parser jumps to the
>>> EndOfFile.
>>>
>>> A simple grammar the illustrates the problem is
>>>
>>> ===============
>>> tokens {ROOT_TOKEN;}
>>> root
>>> : expressions EOF -> ^(ROOT_TOKEN expressions) ;
>>> expressions : ('[' | ']')* ;
>>> OTHER : . ;
>>> ===============
>>>
>>> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and
>>> the
>>> first "s" and will try to recover as if the EOF was the next token.
>>> When looking at the generated Parser it looks like if there is no
>>> viable
>>> alternative in the top rule in this case "root" the parser will
>>> behave
>>> as if it reached the EOF and will skip the rest of the tokens.
>>>
>>> The result AST will contain only children up until the first illegal
>>> token "s".
>>>
>>> I cannot see where my mistake is. It looks like the parser should
>>> not do
>>> that. Can you suggest a workaround for the problem?
>>>
>>> Thanks in advance,
>>> Luchesar
>>>
>>>
>> --
>>
>> Luchesar Cekov
>> Software Engineer
>> +44 (0) 207 239 4949
>> *Ontology Systems*
>> www.ontology.com <http://www.ontology.com/>
>>
>>
>>
>> award list of icons
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> .
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
--
Luchesar Cekov
Software Engineer
+44 (0) 207 239 4949
*Ontology Systems*
www.ontology.com <http://www.ontology.com/>
award list of icons
.
More information about the antlr-interest
mailing list