[antlr-interest] Problem with MismatchedTokenException

Kirby Bohling kirby.bohling at gmail.com
Sun Apr 10 00:21:39 PDT 2011


I'm not much of an expert, and I'm not really sure what you are
attempting to accomplish.  So it is really hard to advise you on how
to resolve the problem in the cleanest way.  Hence my comment about
parroting common advice.

One item about ANTLR to remember is that the lexer runs from start to
finish before the parser does anything.  Even if that isn't always
technically true, it is effectively true.  There is no backtracking
from the parser to get the lexer to "retry" and realize that the token
given is a NAME vs. a VALUE.  You can run the lexer without a parser,
I'd highly recommend you do so, and feed the lexer every kind of input
you can think of and make sure you know exactly what the lexer will
return before doing much of anything else.

My standard advice to everyone is that I got a lot more traction once
I ruthlessly enforced two rules:

1. Always separate the lexer from the parser rules.  It really helps
me keep in mind, that the lexer runs, then the parser runs.  The two
really aren't connected, and there is absolutely no feedback loop
between the two.

2. Never use an inline token.  The example in yours is the '=' sign
(that creates a token because it is part of a parse rule, that isn't
explicitly part of your lexer, so debugging problems can be harder,
because there is a rule you can't see while reading the lexer source).

While the book does this all the time, it really only works out in
trivial grammars from what I can tell.  So while it is great while
quickly hacking together a calculator where all the work is done in
the parser, it is harder to do if you are planning on writing a more
complex example that involves multiple passes.

I know why the ANTLR and the book always show the combined grammars
and why they use the inline tokens, but until you grok what is really
going on, they seem to cause more headaches than they solve.  If
you're the guy that wrote the tool, and can see the generated code,
I'm sure it's really handy.  For most mere mortals, it just screws you
up (at least it did me, I had to sent ANTLR down for a year and come
back to it before I could get over that mental hurdle).

Finally, every time you have a problem, run the lexer, and print out
the token stream.  Make really, really sure, you are getting exactly
the tokens you think you should be in exactly the order you think you
should be.  I've spent a ton of time tracking down a parser problem,
only to realize the lexer was not doing what I expected it to.  Once
you are sure the lexer is doing exactly what you expect, then move on
to validating the parser.

Anyways, those are my lessons I learned the hardway.  I'd recommend
reading the list archives, several folks post pretty sound advise.
Jim Idle especially so.  Read the FAQ entries, as they cover a lot of
useful ground.  Then go look at all the grammars that are out there,
and try and identify a grammar that is close the language you want to
parse.  So if you want to parse a C-like language, study the C parser,
and maybe the Java parsers.  Figuring out how to Lex C/C++ comments,
or Java strings yourself is reasonable difficult when you are first
starting out.  Go look at the grammars of languages you understand and
are similar to get clues about how to structure core pieces.  If you
screw up the core pieces, you'll find yourself patching up corner
cases everywhere.  When you are doing that it is generally a clue that
you've just used the wrong approach.

Best of luck,

Kirby

On Sun, Apr 10, 2011 at 1:43 AM, Kazuki <kazuki.rebirth at hotmail.fr> wrote:
> If I understood well, the lexer will check for all tokens (NAME and VALUE)
> even if I tell him to look only for VALUE. That look like weird ^^'
>
> I'm a little confused about tokens... Your answer is very clear : I have not
> to do such low level validation because, with the context, I can tell
> exactly where the error is in the tree parser.
>
> At this point of view, I should make an only token that match all other
> tokens :
> TOKEN : ('a'..'z'|'A'..'Z'|'0'..'9')+
>
> What can of tokens (expressions, conditions, etc...) I can use to correctly
> distinguish them ?
>
> --
> View this message in context: http://antlr.1301665.n2.nabble.com/Problem-with-MismatchedTokenException-tp6257670p6258339.html
> Sent from the ANTLR mailing list archive at Nabble.com.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list