[antlr-interest] Too many uses for escape character giving me lexer troubles.
John B. Brodie
jbb at acm.org
Thu Mar 15 08:29:43 PDT 2007
>> On Wed, 14 Mar 2007 21:37:07 -0400, Jeremy D. Frens wrote:
>>>>> atom : WORD | IDENTIFIER ;
>>>>> WORD : ( ('a'..'z') | ( '\\' '{' ) )+ ;
>>>>> IDENTIFIER : '\\' ('a'..'z')+ ;
>>>>>
>>>>> The key is that the backslash gets used for two purposes: as a real
>>>>> escape character (to escape '{' in a WORD) and as the beginning of an
>>>>> IDENTIFIER.
>> I would simply not do that at lexer level.
>>
>> What would the following give ?:
>>
>> atom : word | identifier;
>> word : ( LOWCASE | BACKSLASH OBRACE )+;
>> identifier : BACKSLASH LOWCASE+
>> BACKSLASH : '\\';
>> OBRACE : '{';
>> LOWCASE : 'a'..'z';
>
>I've thought about this solution, but I haven't tried it yet. I'm
>probably inclined to go this way just so that I can move forward (if for
>no other reason). However, there's a part of me that's intrigued.
>
Pardon me for butting in... I have not been following this discusion; so
maybe this suggestion is completely bogus. But how about (untested):
atom : WORD | IDENTIFIER ;
WORD : ('a'..'z') WORD_TAIL ;
IDENTIFIER : '\\' ( ( '{' WORD_TAIL { $type=WORD; } )
| ('a'..'z')+
) ;
fragment
WORD_TAIL : ( ('a'..'z') | ( '\\' '{' ) )+ ;
basically this is just left-factoring the handling of the initial backslash
character...
Hope this helps
-jbb
More information about the antlr-interest
mailing list