[antlr-interest] How to feedback to users the string expected on MismatchedTokenException

Mon Jun 18 04:27:17 PDT 2007

> Date: Wed, 13 Jun 2007 09:57:15 +0200
> From: Benjamin Niemann <pink at odahoda.de>
> Subject: Re: [antlr-interest] How to feedback to users the string
> 	expected	on	MismatchedTokenException
> To: antlr-interest at antlr.org
> Message-ID: <f4o80r$sq6$1 at sea.gmane.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Jonathan Thomas wrote:
> 
> > In previous versions of Antlr you could put in 'paraphrase' option
to
> > spit out whatever you liked as the error message for that token. On
> > this
> >
>
page:http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+
AN
> TLR+3
> > down the bottom it mysteriously says there is something similar, but
> > you need the book.&nbsp; I'm still waiting for my book to arrive ...
:-)
> 
> The book only describes paraphrasing for rules (up to the page I am at
now
> -
> but I have finished the error recovery chapter just yesterday).
> 
> To elaborate my suggestion a bit more:
> 
> getErrorMessage() takes the tokenNames array as an argument, so you
could
> override it with a method that calls BaseRecognizer.getErrorMessage()
with
> a custom array.
> I'd suggest to fill this custom array from a mapping, because token
types
> may jump around in the mapping, when the grammar is modified.
> A rough example in Python syntax (still too early for me to switch my
> brain
> into - a very limited - Java mode ;) )
> 
> # this clones the original array
> myTokenNames = TParser.tokenNames[:]
> 
> # a mapping of token types and there new name
> overrides = {
>   PLUS: 'plus sign',
>   DOLLAR: 'much money',
>   ...
> }
> 
> # changes names of those token type mentioned in overrides
> for ttype, name in overrides.items():
>     myTokenNames[ttype] = name
> 
> 
> And you getErrorMessage() looks like (if you'd do it in Python):
> 
> def getErrorMessage(self, exc, tokenNames):
>     return BaseRecognizer.getErrorMessage(self, exc, myTokenNames)
> 
> 
> 
> 
> --
> Benjamin Niemann
> Email: pink at odahoda dot de
> WWW: http://pink.odahoda.de/
> 

If I understood you right, you suggest adding implementation which
resolves the internal token type in to the token string. This implies
that you have to administrate such a mapping in two places: in the token
section and in the host language implementation. Let me give some
example with this simple grammar:

	grammar select;
	options { output = AST;}
	tokens {
	  SELECT = 'select';
	}
	statement:
	  SELECT SEMI! EOF
	;

	SEMI: ';' ;    
	WS : (' '|'\n') {$channel=HIDDEN;} ;

If the input to such parser is the "SELECT;" you will get:

	line 1:0 no viable alternative at character 'S'
	line 1:1 no viable alternative at character 'E'
	line 1:2 no viable alternative at character 'L'
	line 1:3 no viable alternative at character 'E'
	line 1:4 no viable alternative at character 'C'
	line 1:5 no viable alternative at character 'T'
	line 1:6 mismatched input ';' expecting SELECT

The 'expecting SELECT' is confusing in this context and I should like to
respond with 'expecting "select"'.
In some cases the language may consist of lots of tokens and it's
cumbersome to manage a separate mapping in the source code. As I can see
the original token string 'select' is _not_ available in the generated
Java code after the grammar is processed. The generated lexer also
operates with exceptions as: 

	NoViableAltException nvae =
	new NoViableAltException("1:1: Tokens : ( SELECT | SEMI | WS);",
1, 0, input); 

where the 'SELECT' is used. Such an error reporting may mean something
to the guy that wrote the parser & lexer definition, but is completely
useless for those who provide the input according to the defined
vocabulary.

The generated "select.tokens" file contains the mappings and can be used
to resolve tokens in case of errors, but I do not feel that this
solution is elegant enough. 

Possible solution could be to allow the users to provide their own
definition to protected "String vocabFilePattern" in
org.antlr.codegen.CodeGenerator.java which may generate a static java
class that can resolve all tokens.

Even better is to do some better job on error reporting so that antlr is
easier to use when building language formatters, interactive syntax
checkers and context sensitive help.

BR
Silvester Pozarnik