[antlr-interest] Parsing erroneous input

Andreas Heck aheck at gmx.de
Sat May 15 17:37:37 PDT 2010


Hi Jim,

thanks for this great article :)

After thinking about the options and reading some of the code of the C
target I came to the conclusion that I would prefer to write an
exception handler for each rule for which I really need error handling.

I have only a few special cases where I want to accept special cases of
erroneous input so this would be manageable and give me full control.

I just added such an exception handler to one rule of my grammar and now
I want to handle a special case where I need to insert a semicolon token
to fix the syntax. I know how to detect this special case in the handler
how to create a token object but how do insert the token and reapply the
production? I guess I have to insert the token into the input stream,
rewind the stream with REWIND(retval.start), call the production
function and return whatever it returns?

The biggest question for me is how do I insert a new token into the
input stream?

In antlr3baserecognizer.c you insert tokens by propagating them through
the call stack and ultimately returning them by match(). But it doesn't
look like I could do something like that. Is it even possible to change
the input stream or do I have no other option than to overwrite
something like recoverFromMismatchedToken() and handle all my special
cases in there?


Best regards,

Andreas

Am Freitag, den 14.05.2010, 08:56 -0700 schrieb Jim Idle:
> You have to be careful how you implement your grammar rules such that you can recover sensibly from errors. Generally you build a tree or partial tree then analyze that. You may also need to specifically code for some potential missing elements, but again you have to be careful not to introduce ambiguities that break the normal grammar.
> 
> For hints on how to code rules that recover well from errors (especially in loops), see:
> 
> http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery
> 
> Jim
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Andreas Heck
> > Sent: Friday, May 14, 2010 6:59 AM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] Parsing erroneous input
> > 
> > Hello everybody,
> > 
> > I have a parser based on the C target that I want to use to check for
> > correct syntax in one program but I also want to use it to parse
> > erroneous input to do autocompletion in another program.
> > 
> > If I just parse some input where a semicolon is missing on one line the
> > parser recovers since the following statements are transformed into
> > correct AST nodes but the code from the line which caused the error
> > gets
> > represented by two "Tree Node Error" nodes in the AST.
> > 
> > Unfortunately you can't count on perfectly valid input if you want to
> > provide some form of autocompletion.
> > 
> > What is the best approach to parse erroneous input? Do I have to create
> > a second grammar that also accepts input with common errors like a
> > missing semicolon?
> > 
> > Or is there a better way where I can just use the parser which only
> > accepts correct input? Maybe I could somehow get the code line which
> > caused the error and use handwritten code for common error cases to
> > extract the information I need? Or maybe there is a way to get the raw
> > tokens that caused the problem from antlr or make it put the best
> > partial derivation it can create into the AST?
> > 
> > Are there any other approaches?
> > 
> > 
> > Best regards,
> > 
> > Andreas
> > 
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list