[antlr-interest] Parsing erroneous input

Fri May 14 06:59:06 PDT 2010

Hello everybody,

I have a parser based on the C target that I want to use to check for
correct syntax in one program but I also want to use it to parse
erroneous input to do autocompletion in another program.

If I just parse some input where a semicolon is missing on one line the
parser recovers since the following statements are transformed into
correct AST nodes but the code from the line which caused the error gets
represented by two "Tree Node Error" nodes in the AST.

Unfortunately you can't count on perfectly valid input if you want to
provide some form of autocompletion.

What is the best approach to parse erroneous input? Do I have to create
a second grammar that also accepts input with common errors like a
missing semicolon?

Or is there a better way where I can just use the parser which only
accepts correct input? Maybe I could somehow get the code line which
caused the error and use handwritten code for common error cases to
extract the information I need? Or maybe there is a way to get the raw
tokens that caused the problem from antlr or make it put the best
partial derivation it can create into the AST?

Are there any other approaches?

Best regards,

Andreas