[antlr-interest] Too many uses for escape character giving me lexer troubles.

John B. Brodie jbb at acm.org
Thu Mar 15 08:29:43 PDT 2007


>> On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
>>>>> atom        :  WORD | IDENTIFIER ;
>>>>> WORD        :  ( ('a'..'z') | ( '\\' '{' ) )+ ;
>>>>> IDENTIFIER    :   '\\' ('a'..'z')+ ;
>>>>>
>>>>> The key is that the backslash gets used for two purposes: as a real
>>>>> escape character (to escape '{' in a WORD) and as the beginning of an
>>>>> IDENTIFIER.
>>   I would simply not do that at lexer level.
>> 
>>   What would the following give ?:
>> 
>>     atom : word | identifier;
>>     word : ( LOWCASE | BACKSLASH OBRACE )+;
>>     identifier : BACKSLASH LOWCASE+
>>     BACKSLASH : '\\';
>>     OBRACE : '{';
>>     LOWCASE : 'a'..'z';
>
>I've thought about this solution, but I haven't tried it yet.  I'm
>probably inclined to go this way just so that I can move forward (if for
>no other reason).  However, there's a part of me that's intrigued.
>

Pardon me for butting in... I have not been following this discusion; so
maybe this suggestion is completely bogus. But how about (untested):

atom        :  WORD | IDENTIFIER ;
WORD        :  ('a'..'z') WORD_TAIL ;
IDENTIFIER  :   '\\' ( ( '{' WORD_TAIL { $type=WORD; } )
                     | ('a'..'z')+
                     ) ;
fragment
WORD_TAIL   :  ( ('a'..'z') | ( '\\' '{' ) )+ ;

basically this is just left-factoring the handling of the initial backslash
character...

Hope this helps
   -jbb


More information about the antlr-interest mailing list