[antlr-interest] Tokens and literals: how to avoid conflics?

Johannes Luber jaluber at gmx.de
Tue Jul 15 08:41:39 PDT 2008


Gioele Barabucci schrieb:
> Jim Idle wrote:
>> I really , once again, cannot stress too much the fact that new users
>> should not use the inline 'quote' rules in the parser. They really send
>> you down the wrong streets until you are completely familiar with the
>> parser/lexer process. I look at your grammar and see the obvious
>> problems, but I just don't see how new users would.
> Could you please point me to guides or tutorials about the ANTLR lexer and
> the correct "style" I should use to write token rules? I could not find
> anything on the net.
> 
>> ID    : 'ID' ;
>> IDENT : ('a'..'z' | 'A'..'Z')+ ;
>> HASH  : '#'   // Many things prefix with HASH, differentiate them here
>>                (  (FIX)=>FIX  { $type = FIX; }
>>                   | (IMP)=>IMP {$type = IMP; }
>>                   | // Neither keyword, sometimes HASH is just HASH and
>> not pounds
>>                )
>>             ;
>>
>> Now, in the parser use teh token names:
>>
>> stmt: ID S idName S (IMP|FIX) EOF ;
>> idName : HASH IDENT;
> 
> Thank you for this solution: I'll use in many similar cases I have in my
> grammar.
> 
> Sadly, this solution solves the problem only where there is a precise char
> that one can use to discriminate. What about this example where the text of
> a keyword can be used in other rules:
> 
> stmt: ID S simple_name S ('#IMP'|'#FIX') EOF;
> 
> simple_name: NAME;
> ID: 'id';
> NAME: ('a'..'z')+
> S: (' '|'\n')+
> 
> This grammar will recognise 'id ix #FIX' but will fail on 'id id #FIX' with
> the usual MismatchedTokenException. They keyword 'id' cannot be recognized
> as a NAME token.
> 
> Is there a way to tell ANTLR "look for the characters 'id' only when in the
> ID token, in all the other cases classify it as NAME (or whatever fits
> it)"?
> 
> This happens quite often in my grammar (obviously this is just a simple
> test-case for my problems): I have many keywords that lose their special
> meaning once they are not in a certain position.
> 

It is not possible (yet?) to do context-dependent lexing. The solution 
is to add ID as possible alternative in those special positions.

Johannes


More information about the antlr-interest mailing list