[antlr-interest] How to feedback to users the string expected on MismatchedTokenException
Benjamin Niemann
pink at odahoda.de
Mon Jun 18 11:16:29 PDT 2007
Silvester Pozarnik wrote:
>> Jonathan Thomas wrote:
>>
>> > In previous versions of Antlr you could put in 'paraphrase' option
> to
>> > spit out whatever you liked as the error message for that token. On
>> > this
>> >
>>
> page:http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+
> AN
>> TLR+3
>> > down the bottom it mysteriously says there is something similar, but
>> > you need the book. I'm still waiting for my book to arrive ...
> :-)
>>
>> The book only describes paraphrasing for rules (up to the page I am at
> now
>> -
>> but I have finished the error recovery chapter just yesterday).
>>
>> To elaborate my suggestion a bit more:
>>
>> getErrorMessage() takes the tokenNames array as an argument, so you
> could
>> override it with a method that calls BaseRecognizer.getErrorMessage()
> with
>> a custom array.
>> I'd suggest to fill this custom array from a mapping, because token
> types
>> may jump around in the mapping, when the grammar is modified.
>> A rough example in Python syntax (still too early for me to switch my
>> brain
>> into - a very limited - Java mode ;) )
>>
>> # this clones the original array
>> myTokenNames = TParser.tokenNames[:]
>>
>> # a mapping of token types and there new name
>> overrides = {
>> PLUS: 'plus sign',
>> DOLLAR: 'much money',
>> ...
>> }
>>
>> # changes names of those token type mentioned in overrides
>> for ttype, name in overrides.items():
>> myTokenNames[ttype] = name
>>
>>
>> And you getErrorMessage() looks like (if you'd do it in Python):
>>
>> def getErrorMessage(self, exc, tokenNames):
>> return BaseRecognizer.getErrorMessage(self, exc, myTokenNames)
>
> If I understood you right, you suggest adding implementation which
> resolves the internal token type in to the token string. This implies
> that you have to administrate such a mapping in two places: in the token
> section and in the host language implementation. Let me give some
> example with this simple grammar:
>
>
> grammar select;
> options { output = AST;}
> tokens {
> SELECT = 'select';
> }
> statement:
> SELECT SEMI! EOF
> ;
>
> SEMI: ';' ;
> WS : (' '|'\n') {$channel=HIDDEN;} ;
>
>
> If the input to such parser is the "SELECT;" you will get:
>
> line 1:0 no viable alternative at character 'S'
> line 1:1 no viable alternative at character 'E'
> line 1:2 no viable alternative at character 'L'
> line 1:3 no viable alternative at character 'E'
> line 1:4 no viable alternative at character 'C'
> line 1:5 no viable alternative at character 'T'
> line 1:6 mismatched input ';' expecting SELECT
>
> The 'expecting SELECT' is confusing in this context and I should like to
> respond with 'expecting "select"'.
> In some cases the language may consist of lots of tokens and it's
> cumbersome to manage a separate mapping in the source code. As I can see
> the original token string 'select' is _not_ available in the generated
> Java code after the grammar is processed. The generated lexer also
> operates with exceptions as:
>
> NoViableAltException nvae =
> new NoViableAltException("1:1: Tokens : ( SELECT | SEMI | WS);",
> 1, 0, input);
>
> where the 'SELECT' is used. Such an error reporting may mean something
> to the guy that wrote the parser & lexer definition, but is completely
> useless for those who provide the input according to the defined
> vocabulary.
NoViableAltExceptions are especially tricky. In simple cases you could just
report a set of expected tokens or short token sequences for each
alternative. But once fancy stuff like LL(*) or predicates are involved,
things get complicated. I don't think there's a general way for ANTLR to
construct better error messages yet. What would be needed is a 'paraphrase'
option as in V2, preferably for rules and subrules. decisionNumber and
stateNumber from the exception ('1' and '0' above) may than somehow be used
to fetch the appropriate paraphrases.
> The generated "select.tokens" file contains the mappings and can be used
> to resolve tokens in case of errors, but I do not feel that this
> solution is elegant enough.
>
> Possible solution could be to allow the users to provide their own
> definition to protected "String vocabFilePattern" in
> org.antlr.codegen.CodeGenerator.java which may generate a static java
> class that can resolve all tokens.
>
> Even better is to do some better job on error reporting so that antlr is
> easier to use when building language formatters, interactive syntax
> checkers and context sensitive help.
There's certainly room for improvements :)
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
More information about the antlr-interest
mailing list