[antlr-interest] How to feedback to users the string expected on MismatchedTokenException

Mon Jun 18 11:16:29 PDT 2007

Silvester Pozarnik wrote:

>> Jonathan Thomas wrote:
>> 
>> > In previous versions of Antlr you could put in 'paraphrase' option
> to
>> > spit out whatever you liked as the error message for that token. On
>> > this
>> >
>>
> page:http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+
> AN
>> TLR+3
>> > down the bottom it mysteriously says there is something similar, but
>> > you need the book.&nbsp; I'm still waiting for my book to arrive ...
> :-)
>> 
>> The book only describes paraphrasing for rules (up to the page I am at
> now
>> -
>> but I have finished the error recovery chapter just yesterday).
>> 
>> To elaborate my suggestion a bit more:
>> 
>> getErrorMessage() takes the tokenNames array as an argument, so you
> could
>> override it with a method that calls BaseRecognizer.getErrorMessage()
> with
>> a custom array.
>> I'd suggest to fill this custom array from a mapping, because token
> types
>> may jump around in the mapping, when the grammar is modified.
>> A rough example in Python syntax (still too early for me to switch my
>> brain
>> into - a very limited - Java mode ;) )
>> 
>> # this clones the original array
>> myTokenNames = TParser.tokenNames[:]
>> 
>> # a mapping of token types and there new name
>> overrides = {
>>   PLUS: 'plus sign',
>>   DOLLAR: 'much money',
>>   ...
>> }
>> 
>> # changes names of those token type mentioned in overrides
>> for ttype, name in overrides.items():
>>     myTokenNames[ttype] = name
>> 
>> 
>> And you getErrorMessage() looks like (if you'd do it in Python):
>> 
>> def getErrorMessage(self, exc, tokenNames):
>>     return BaseRecognizer.getErrorMessage(self, exc, myTokenNames)
> 
> If I understood you right, you suggest adding implementation which
> resolves the internal token type in to the token string. This implies
> that you have to administrate such a mapping in two places: in the token
> section and in the host language implementation. Let me give some
> example with this simple grammar:
> 
> 
> grammar select;
> options { output = AST;}
> tokens {
> SELECT = 'select';
> }
> statement:
> SELECT SEMI! EOF
> ;
> 
> SEMI: ';' ;
> WS : (' '|'\n') {$channel=HIDDEN;} ;
> 
> 
> If the input to such parser is the "SELECT;" you will get:
> 
> line 1:0 no viable alternative at character 'S'
> line 1:1 no viable alternative at character 'E'
> line 1:2 no viable alternative at character 'L'
> line 1:3 no viable alternative at character 'E'
> line 1:4 no viable alternative at character 'C'
> line 1:5 no viable alternative at character 'T'
> line 1:6 mismatched input ';' expecting SELECT
> 
> The 'expecting SELECT' is confusing in this context and I should like to
> respond with 'expecting "select"'.
> In some cases the language may consist of lots of tokens and it's
> cumbersome to manage a separate mapping in the source code. As I can see
> the original token string 'select' is _not_ available in the generated
> Java code after the grammar is processed. The generated lexer also
> operates with exceptions as:
> 
> NoViableAltException nvae =
> new NoViableAltException("1:1: Tokens : ( SELECT | SEMI | WS);",
> 1, 0, input);
>
> where the 'SELECT' is used. Such an error reporting may mean something
> to the guy that wrote the parser & lexer definition, but is completely
> useless for those who provide the input according to the defined
> vocabulary.

NoViableAltExceptions are especially tricky. In simple cases you could just
report a set of expected tokens or short token sequences for each
alternative. But once fancy stuff like LL(*) or predicates are involved,
things get complicated. I don't think there's a general way for ANTLR to
construct better error messages yet. What would be needed is a 'paraphrase'
option as in V2, preferably for rules and subrules. decisionNumber and
stateNumber from the exception ('1' and '0' above) may than somehow be used
to fetch the appropriate paraphrases.

> The generated "select.tokens" file contains the mappings and can be used
> to resolve tokens in case of errors, but I do not feel that this
> solution is elegant enough.
> 
> Possible solution could be to allow the users to provide their own
> definition to protected "String vocabFilePattern" in
> org.antlr.codegen.CodeGenerator.java which may generate a static java
> class that can resolve all tokens.
> 
> Even better is to do some better job on error reporting so that antlr is
> easier to use when building language formatters, interactive syntax
> checkers and context sensitive help.

There's certainly room for improvements :)

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/