[antlr-interest] case sensitivity for ANTLR v3 lexers

Terence Parr parrt at cs.usfca.edu
Tue May 16 10:58:07 PDT 2006


On May 16, 2006, at 10:50 AM, Martin Probst wrote:

>> Soon we will need case insensitive lexing for v3.  I am hoping to  
>> leave the input stream stuff alone and just subclass Lexer as  
>> CaseInsensitiveLexer, which overrides match()
>> methods.  Then alter code gen for char set matching (because it's  
>> generated inline).
>>
>> The tokens would have the unmolested input chars.
>>
>> Does this sound right?
>
> No idea, but did you think about internationalization issues? I  
> mean, in European languages there is a clear, defined concept of  
> upper case and lower case. However I think there are some asian  
> languages etc where this is not exactly true, and  
> java.lang.String#equalsIgnoreCase() doesn't get it right as far as  
> I know. Maybe provide an overridable (ouch) method of some kind?

If I override match(char c) so that it uses Character.toUpperCase()  
or whatever, it should be ok I think.

Ter



More information about the antlr-interest mailing list