[antlr-interest] case sensitivity for ANTLR v3 lexers

Tue May 16 11:27:32 PDT 2006

On May 16, 2006, at 10:58 AM, Terence Parr wrote:

>
> On May 16, 2006, at 10:50 AM, Martin Probst wrote:
>
>>> Soon we will need case insensitive lexing for v3.  I am hoping to  
>>> leave the input stream stuff alone and just subclass Lexer as  
>>> CaseInsensitiveLexer, which overrides match()
>>> methods.  Then alter code gen for char set matching (because it's  
>>> generated inline).
>>>
>>> The tokens would have the unmolested input chars.
>>>
>>> Does this sound right?
>>
>> No idea, but did you think about internationalization issues? I  
>> mean, in European languages there is a clear, defined concept of  
>> upper case and lower case. However I think there are some asian  
>> languages etc where this is not exactly true, and  
>> java.lang.String#equalsIgnoreCase() doesn't get it right as far as  
>> I know. Maybe provide an overridable (ouch) method of some kind?
>
> If I override match(char c) so that it uses Character.toUpperCase()  
> or whatever, it should be ok I think.

We should also probably let people set the locale for the  
uppercasing, right?

Ter