[antlr-interest] Continue parsing after an error
Jim Idle
jimi at temporal-wave.com
Thu Jul 1 05:49:12 PDT 2010
On Wed, 30 Jun 2010 10:48:39 -0700
Gordon Tyler <Gordon.Tyler at quest.com> wrote:
> I'm not very familiar with ANTLR's error recovery mechanisms, but I
>suspect that the generated code for the 'expressions' rule looks for
>a character that it recognizes as the start of an 'expression' rule
>before it calls into the 'expression' rule and when it doesn't find
>one in the second case, it exits out into the root rule, which then
>checks if the next token is EOF and fails.
Please read the article on the wiki entitled "Custom error recovery" -
this will give you all the information you need.
Jim
>
> But this is just speculation. Hopefully one of the more experienced
>ANTLRers can give you a better answer.
>
> -----Original Message-----
>From: antlr-interest-bounces at antlr.org
>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
> Sent: June 30, 2010 1:35 PM
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Continue parsing after an error
>
> Hi Gordon,
>
> Thanks for the prompt response.
> Adding OTHER as an alternative was what I tried to do in the
>beginning.
> Unfortunately my use case is a bit more complex. I have worked out a
> better example below.
> In this example, the input string [ax][kx][ax] is wrong (k is not
> allowed) but the grammar builds the full ast tree, so it recovers
>from
> the error - it would generate three expression nodes the second of
>which
> contains a ErrorCommonToken inside as per
>recoverFromMismatchedToken().
> The string [ax]sax][ax] on the other end, generates only the first
>bit
> of the tree, till the error. - it generares only one expression
>node.
>
> I do not understand why I get this different behavior - the parser
> recovers if the error happens in the middle of a rule, but not if
>the
> error is at the beginning of a rule.
>
> Is this a problem in my grammar or it is just the way ANTLR works?
>
> Thanks,
> Luchesar
>
> ================
> grammar StartOfARuleFailTest;
>
> options { output=AST; ASTLabelType=CommonTree; }
>
> tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }
>
> @members {
> @Override
> protected Object recoverFromMismatchedToken(IntStream input, int
> ttype, BitSet follow)
> throws RecognitionException {
> MismatchedTokenException ex = new
> MismatchedTokenException(ttype, input);
> input.consume();
> return createErrorToken(ex, ttype);
> }
>
> public static ErrorCommonToken
>createErrorToken(RecognitionException
> ex, int ttype) {
> ErrorCommonToken errorCommonToken = new
>ErrorCommonToken(ex.token);
> errorCommonToken.setType(ttype);
>
> return errorCommonToken;
> }
> }
>
> root : expressions EOF -> ^(ROOT_TOKEN expressions) ;
> expressions : expression* -> ^(EXPRESSIONS expression*) ;
> expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');
>
> OTHER : . ;
> ================
>
>
> Gordon Tyler wrote:
>> The grammar you have defined says, roughly:
>>
>> Parse any number of '[' or ']' until you reach EOF.
>>
>> It does not describe what to do if something other than '[' or ']'
>>are found before EOF is found.
>>
>> You have defined a token, OTHER, to match the other stuff, but your
>>parse rules do not reference OTHER. Perhaps something like this would
>>work:
>>
>> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>>
>>
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
>> Sent: June 30, 2010 10:10 AM
>> To: antlr-interest at antlr.org
>> Cc: Valerio Malenchino
>> Subject: [antlr-interest] Continue parsing after an error
>>
>> Dear ANTLR enthusiasts,
>>
>> I am struggling with a problem. The parser jumps to the end of file
>>from
>> the middle of the document.
>>
>> The setup is as follow:
>> * I have two alternatives flowed by EOF
>> * during parse time in the middle of the document next token can
>>not
>> match either alternatives start
>>
>> This leads to parsing termination because the parser jumps to the
>>EndOfFile.
>>
>> A simple grammar the illustrates the problem is
>>
>> ===============
>> tokens {ROOT_TOKEN;}
>> root
>> : expressions EOF -> ^(ROOT_TOKEN expressions) ;
>> expressions : ('[' | ']')* ;
>> OTHER : . ;
>> ===============
>>
>> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and
>>the
>> first "s" and will try to recover as if the EOF was the next token.
>> When looking at the generated Parser it looks like if there is no
>>viable
>> alternative in the top rule in this case "root" the parser will
>>behave
>> as if it reached the EOF and will skip the rest of the tokens.
>>
>> The result AST will contain only children up until the first illegal
>> token "s".
>>
>> I cannot see where my mistake is. It looks like the parser should
>>not do
>> that. Can you suggest a workaround for the problem?
>>
>> Thanks in advance,
>> Luchesar
>>
>
> --
>
> Luchesar Cekov
> Software Engineer
> +44 (0) 207 239 4949
> *Ontology Systems*
> www.ontology.com <http://www.ontology.com/>
>
>
>
> award list of icons
>
>
>
>
>
>
>
>
>
> .
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list