[antlr-interest] Fwd: Better Error Reporting in ANTLR

Thu Apr 17 10:29:19 PDT 2008

Decided to forward this since I just realised I accidentally didn't send it
to the list.

Adam

---------- Forwarded message ----------
From: Adam Connelly <adam.rpconnelly at googlemail.com>
Date: 13 Apr 2008 03:22
Subject: Re: [antlr-interest] Better Error Reporting in ANTLR
To: siemsen at ucar.edu

I think I've got a better way to do the error handling in ANTLR, but I'm not
sure whether Terence would like it since I think there are reasons for the
way it's currently handled.

I came up with this with the help of a colleague after getting annoyed with
stuff like SEMI being output instead of ';' in the error text.  The main
idea is that you have an IErrorHandler interface that every parser has an
instance of and delegates any error handling to.  You also have a
ITokenInfoSource interface that lets the IErrorHandler retrieve information
about tokens.  This way it removes the reporting of errors from the parser
and allows you to mix and match parsers, error handlers and info sources.

The main parts of the system are:
Parser <- contains an instance of IErrorHandler and uses
IErrorHandler.ReportError() to report errors
IErrorhandler <- does the reporting of errors
ITokenInfoSource <- allows access to various pieces of information about a
token
ITokenInfo <- Information about a token
ErrorHandlerService <- singleton that allows handlers to be registered by
name

There are a few reasons that I think this system would be better than the
current one.  They include:

1. The ability to easily alter the mechanism of error handling without
altering the parser, and at run time (for example switching between
reporting errors to the console and some logging system)
2. Easier multilingual support (i.e. you could write different language
error handlers)
3. The use of ITokenInfo would allow more human understandable error
reporting and would support having multiple lexers for the same parser with
different error information for the different lexers (back to the
multilingual thing?)
4. A number of default error handlers could be packaged with ANTLR for stuff
like built in support for well known logging frameworks, etc.
5. You remove the exception logic for error reporting out of the parser and
make the grammar easier to understand.

Along with altering the parser class it would be good to alter the ANTLR
grammar to support some kind of description to be associated with lexer
rules (the "error text").  This would stop the token name being output in
situations where it would be more appropriate to have a textual description.

The TokenInfos would all be generated when the parser was being generated
using a slightly modified token file (that would just have more information
in it).

I've attached a png with a class diagram if anyone's interested - I wouldn't
take it too seriously since I think I altered some stuff since I made it,
but it'll help you understand what I'm proposing better than my jibberings
(since I don't think I've done it justice).

Any thoughts?

Cheers,
Adam

On 10/04/2008, siemsen at ucar.edu <siemsen at ucar.edu> wrote:
>
>  I also report semantic errors with System.err.println().  Line and
> character information is in the token.  In your ifStatement rule, try this:
> if (!($expression.value instanceof Boolean)) {
>     System.err.println("(" + $expression.token.getLine() + ":" +
>                        $expression.token.getCharPositionInLine() + "): " +
>                        "if expression '" + $expression.text + "'does not
> evaluate to a boolean");
>     return;
> }
>
> I would also like to do this the "right" way, with an exception.  When I
> tried, the exception logic made it harder to understand, so I stuck with
> System.err.println().  I hope someone can suggest a better way.
> -- Pete
>
>
> On Apr 10, 2008, at 9:06 AM, Robert Stehwien wrote:
>
> I have a simple grammar where there can be semantic errors.  Right now
> I'm using System.err.println() to report those errors.  What I'd like
> to do is create and throw an exception that takes just an error string
> and have the same line and character information reported that ANTLR
> errors report.  Any good suggestions on how I can do that?
>
> Here are examples of the errors in my grammar:
> --------------------
> ifStatement     : ^(IF expression s+=.+)
>   {
>     if (!($expression.value instanceof Boolean)) {
>       System.err.println("if expression '" + $expression.text + "'
> does not evaluate to a boolean");
>       return;
>     }
>     if (((Boolean)$expression.value).booleanValue()) {
>       runStatement((CommonTree)$s.get(0));
>     }
>     else if ($s.size() > 1) {
>       runStatement((CommonTree)$s.get(1));
>     }
>   }
>   ;
> --------------------
> @members {
>     private Map<String, Object> variables = new HashMap<String, Object>();
>
>     private void defineInt(String name) {
>         if (variables.containsKey(name)) {
>             System.err.println("variable '" + name + "' already defined");
>         }
>         variables.put(name, BigInteger.ZERO);
>     }
> }
> --------------------
>
> Thanks,
> Robert
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080417/e95bf579/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ErrorHandler.png
Type: image/png
Size: 6770 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080417/e95bf579/attachment-0001.png