[antlr-interest] case sensitivity for ANTLR v3 lexers
Terence Parr
parrt at cs.usfca.edu
Tue May 16 11:27:32 PDT 2006
On May 16, 2006, at 10:58 AM, Terence Parr wrote:
>
> On May 16, 2006, at 10:50 AM, Martin Probst wrote:
>
>>> Soon we will need case insensitive lexing for v3. I am hoping to
>>> leave the input stream stuff alone and just subclass Lexer as
>>> CaseInsensitiveLexer, which overrides match()
>>> methods. Then alter code gen for char set matching (because it's
>>> generated inline).
>>>
>>> The tokens would have the unmolested input chars.
>>>
>>> Does this sound right?
>>
>> No idea, but did you think about internationalization issues? I
>> mean, in European languages there is a clear, defined concept of
>> upper case and lower case. However I think there are some asian
>> languages etc where this is not exactly true, and
>> java.lang.String#equalsIgnoreCase() doesn't get it right as far as
>> I know. Maybe provide an overridable (ouch) method of some kind?
>
> If I override match(char c) so that it uses Character.toUpperCase()
> or whatever, it should be ok I think.
We should also probably let people set the locale for the
uppercasing, right?
Ter
More information about the antlr-interest
mailing list