[antlr-interest] MismatchedTokenException

Marcin Rzeźnicki marcin.rzeznicki at gmail.com
Thu Dec 17 09:07:39 PST 2009


On Thu, Dec 17, 2009 at 1:37 AM, David-Sarah Hopwood
<david-sarah at jacaranda.org> wrote:
> Marcin Rzeźnicki wrote:
>> 2009/12/14 Marcin Rzeźnicki <marcin.rzeznicki at gmail.com>:
>>> 2009/12/13 Jim Idle <jimi at temporal-wave.com>:
>>>> This usually means that your lexer token numbers are out of sync with your
>>>> parser tokens. Regen in correct order and make sure all tokens have been
>>>> declared.
>>>>
>>> Umm, what if I work with combined grammar? And some of literals are 'inlined'?
>>
>> I think I know what has been causing this problem but I am scratching
>> my head. It seems that ANTLR lexer is, well, a strange beast.
>> I have a rule, say
>> CLASS
>>   :
>>   'class'
>>   ;
>>
>> and below
>>
>> IDENTIFIER
>>   :
>>   {Character.isJavaIdentifierStart(input.LA(1))}?=> . (
>> {Character.isJavaIdentifierPart(input.LA(1))}?=> . )*
>>   ;
>>
>> (the latter rule has been questioned here, but bear with me a while, I
>> need it to explain my case)
>>
>> Now, upon seeing input 'class' ANTLR matches IDENTIFIER because of
>> this gating predicate. Well, 'class' would have been a valid
>> identifier, of course but shouldn't it try to match 'class' based on
>> rules precedence?
>
> This seems to be an idiosyncrasy of how ANTLR lexers treat gated semantic
> predicates. Although . can match the 'c' in 'class', it appears that ANTLR
> doesn't recognize that because of the predicate. That is the reason for the
> additional complexity in the rules that I posted earlier:
>

I wonder, it seems that it knows that it can match CLASS and
IDENTIFIER at the point of seeing 'c' in fresh state. The problem
lies, I think, in the fact that it ignores the latter guard -
isJavaIdentifierPart. My conclusion after debugging the lexer is that
it behaves like:
1: I see 'c' so that can be a CLASS - good - move on.
2: I see 'l' so that can still be a CLASS, else I would assume that I
would be  an ID
3: ...
4: Now I might be a CLASS, I am looking beyond if (
((LA35_411>='\u0000' && LA35_411<='\uFFFF')) &&
((Character.isJavaIdentifierStart(input.LA(1))))) (Now, I do not get
this completely why it checks here so, it should have checked
isJavaIdPart instead)
5: From the above check I conclude that this is an ID

Steps 4 and 5 might be a little bit unclear - I think that the input
rewind has taken place somewhere, hence antlr conclusion. Possibly
that's the error cause. I'll investigate further.
Thank you for an interesting idea


-- 
Greetings
Marcin Rzeźnicki


More information about the antlr-interest mailing list