[antlr-interest] MismatchedTokenException and how to find errors in ANTLRWorks

Thomas Brandon tbrandonau at gmail.com
Tue Feb 12 21:53:42 PST 2008


On Feb 13, 2008 3:37 PM, Guntis Ozols <guntiso at latnet.lv> wrote:
>
> > > Sure it is confusing. I fell into this 'xy' trap, too.
> > > If COLON is defined and STAR is defined, then creating
> > > another token for ':*' in a *parser* rule is confusing.
> > > It should be fixed in antlr.
> > > At least, antlr should emit 'anonymous token' warning[s].
> > >
> > > I prefer literals because of simplicity / readability.
> > > The tool needs to be patched, not users.
>
>
> > If I understand your "problem" I don't think this can be fixed. If you
> > type ':*', how can ANTLR know you don't want ':*' as a single token.
>
> Because:
>  - there is only one grammar file for both lexer and parser
>  - ':*' is mentioned in a parser rule only
>  - there are lexer rules in that same file for ':' and '*'
>  so lexer rules can be combined automatically to support parser rules
I don't really think ANTLR should guess that. That could result in
very weird behaviour when a literal you use in a parser happens to be
the combination of two lexer rules. And also, a literal in the parser
should refer to a single token not multiple tokens. ':*' and ':' '*'
should be referring to different things.
>
>
>
> > It is perfectly valid to have ':', '*' and ':*' tokens, so this is
> > what ANTLR produces.
> > And emitting a warning just because you use a perfectly valid (if easily
> > misusable) construct doesn't seem right.
> > Unfortunately I don't think this is a case where ANTLR can protect
> > you from yourself.
> > Perhaps literals could be allowed in the parser but only to refer to tokens
> > defined in the lexer, generating an error otherwise. That should alleviate
> > many (if not all) of the issues while still allowing the stylistic choice
> > some seem to want.
> >
> > Tom.
>
> Emitting a warning doesn't seem right, generating an error is better?
> I feel OK with that. Although I actually think that allowing literals
> in parser can enable to create some very simple grammars easier.
> But magically creating tokens by default really is a bad choice.
> Perhaps an option to explicitly allow creation of anonymous tokens
> is the best solution.
>
> Guntis
My point was that as long as this is valid syntax it seems very
peculiar to give a warning (or error) just because that sort of syntax
is often misunderstood or misused.
I guess you could have an option and otherwise only allow literals
that were specified in the lexer. Though I'm not even sure I think
either of those is worth the effort (especially the option).
There should be (and probably is) introductory documentation that
details such things. This results from a basic misunderstanding as to
the separation of lexing and parsing, this crops up here and in other
places, it's something you just have to learn and once you do I think
the current arrangement is natural.

Tom.


More information about the antlr-interest mailing list