[antlr-interest] MismatchedTokenException and how to find errors in ANTLRWorks

Tue Feb 12 16:03:24 PST 2008

Matt Benson wrote:
> --- Gavin Lambert <antlr at mirality.co.nz> wrote:
>
>   
>> At 08:39 13/02/2008, Matt Benson wrote:
>>  >Seriously, Jim:  what's the likelihood of your (or
>>  >anyone's) being able to sum up the reason for this
>>  >fairly succinctly?  What is the #1 gotcha about
>> using
>>  >literals vs. tokens that ANTLR3 beginners
>> overlook?
>>
>> I gave my answer to that question yesterday (though
>> more 
>> briefly).  As I see it, there are two big drawbacks
>> to using 
>> literal strings in the parser:
>>
>> 1. The generated code becomes filled with references
>> to "T32" and 
>> "T63" etc rather than anything meaningful, making it
>> harder to 
>> understand.  (And also complicating things if you
>> want to generate 
>> an AST later on.)
>>
>> 2. It's too easy to forget that each quoted string
>> produces a 
>> lexer rule (which may conflict with other rules),
>> making it harder 
>> to find the source of ambiguity problems.  In
>> addition, "'x' 'y'" 
>> is not the same as "'xy'" in the parser, though it
>> is in the 
>> lexer; this can cause confusion.
>>
>>
>>     
>
> Thanks, Gavin.  Now flip the coin:  what, if anything,
> is _good_ about literals in the parser outside of a
> rapid prototyping context?
>
> -Matt
>
>   
Anyone coming from the ANTLR2 world will remember another problem with
literals. If the literal was the same as a token, the literal overrode
the token. Any parser rule that used the token would get a hard to debug
mismatched token exception. ANTRL3 solves the problem by giving both
forms the same token id, thereby making them synonyms. So, if you live
in both worlds, it might be best to avoid literals.
Another drawback of literals is that there is no error checking.
Literals are alway "right", even if misspelled. This can become an issue
if the literal is used more than once in the grammar.

On the plus side, literals stop you from needing to define unique,
meaningful names for each literal.
I find that literals make the grammar more readable. The syntax diagrams
of AntlrWorks make you really appreciate literals. When switching a
grammar from ANTLR2 to ANTLR3 I switched all of the keywords to literals.
In my testing, I have found that I need to define the literal tokens
before the corresponding regular expressions. Literals seem to do this
automatically.