[antlr-interest] Continue parsing after an error
Luchesar Cekov
luchesar.cekov at ontology-partners.com
Wed Jun 30 10:35:23 PDT 2010
Hi Gordon,
Thanks for the prompt response.
Adding OTHER as an alternative was what I tried to do in the beginning.
Unfortunately my use case is a bit more complex. I have worked out a
better example below.
In this example, the input string [ax][kx][ax] is wrong (k is not
allowed) but the grammar builds the full ast tree, so it recovers from
the error - it would generate three expression nodes the second of which
contains a ErrorCommonToken inside as per recoverFromMismatchedToken().
The string [ax]sax][ax] on the other end, generates only the first bit
of the tree, till the error. - it generares only one expression node.
I do not understand why I get this different behavior - the parser
recovers if the error happens in the middle of a rule, but not if the
error is at the beginning of a rule.
Is this a problem in my grammar or it is just the way ANTLR works?
Thanks,
Luchesar
================
grammar StartOfARuleFailTest;
options { output=AST; ASTLabelType=CommonTree; }
tokens { ROOT_TOKEN;ERROR_TOKEN;EXPRESSIONS;EXPRESSION; }
@members {
@Override
protected Object recoverFromMismatchedToken(IntStream input, int
ttype, BitSet follow)
throws RecognitionException {
MismatchedTokenException ex = new
MismatchedTokenException(ttype, input);
input.consume();
return createErrorToken(ex, ttype);
}
public static ErrorCommonToken createErrorToken(RecognitionException
ex, int ttype) {
ErrorCommonToken errorCommonToken = new ErrorCommonToken(ex.token);
errorCommonToken.setType(ttype);
return errorCommonToken;
}
}
root : expressions EOF -> ^(ROOT_TOKEN expressions) ;
expressions : expression* -> ^(EXPRESSIONS expression*) ;
expression : '[' 'a' 'x' ']' -> ^(EXPRESSION '[' 'a' 'x' ']');
OTHER : . ;
================
Gordon Tyler wrote:
> The grammar you have defined says, roughly:
>
> Parse any number of '[' or ']' until you reach EOF.
>
> It does not describe what to do if something other than '[' or ']' are found before EOF is found.
>
> You have defined a token, OTHER, to match the other stuff, but your parse rules do not reference OTHER. Perhaps something like this would work:
>
> root : (expressions | OTHER)* EOF -> ^(ROOT_TOKEN expressions) ;
>
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Luchesar Cekov
> Sent: June 30, 2010 10:10 AM
> To: antlr-interest at antlr.org
> Cc: Valerio Malenchino
> Subject: [antlr-interest] Continue parsing after an error
>
> Dear ANTLR enthusiasts,
>
> I am struggling with a problem. The parser jumps to the end of file from
> the middle of the document.
>
> The setup is as follow:
> * I have two alternatives flowed by EOF
> * during parse time in the middle of the document next token can not
> match either alternatives start
>
> This leads to parsing termination because the parser jumps to the EndOfFile.
>
> A simple grammar the illustrates the problem is
>
> ===============
> tokens {ROOT_TOKEN;}
> root
> : expressions EOF -> ^(ROOT_TOKEN expressions) ;
> expressions : ('[' | ']')* ;
> OTHER : . ;
> ===============
>
> If then I try parsing "[[][]]sdsdf[]][]][" the parsing will stop and the
> first "s" and will try to recover as if the EOF was the next token.
> When looking at the generated Parser it looks like if there is no viable
> alternative in the top rule in this case "root" the parser will behave
> as if it reached the EOF and will skip the rest of the tokens.
>
> The result AST will contain only children up until the first illegal
> token "s".
>
> I cannot see where my mistake is. It looks like the parser should not do
> that. Can you suggest a workaround for the problem?
>
> Thanks in advance,
> Luchesar
>
--
Luchesar Cekov
Software Engineer
+44 (0) 207 239 4949
*Ontology Systems*
www.ontology.com <http://www.ontology.com/>
award list of icons
.
More information about the antlr-interest
mailing list