[antlr-interest] ANTLR C# runtime fix, please review

sarkar_soumen sarkar_soumen at yahoo.com
Thu Oct 2 17:16:42 PDT 2003


ANTLR C# runtime support,

I have a bug/enhancement to report in ANTLR C# runtime class
antlr272/antlr/CharScanner.cs

I have developed a custom markup language(like XML) Parser on .NET
platform which is being used internationally. I, therefore, had to
make sure the parser is usable for all culture setting in .NET
platform.

This fix is related to globalization aspect of ANTLR C# runtime. 

Find and review (and accept if you would like) the fix below.

Thanks,
Soumen Sarkar.

Background
==========
Case is a normative concept. However, case-mapping is NOT a normative
concept. Therefore, a product like ANTLR C# runtime should NEVER
use toLower(), toUpper() call without being EXPLICIT about culture.
Please refer to chapter 5 of UNICODE 4.0 specification
 
Implementation Guidelines, 5.18 Case Mappings
http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf

For example, Turkish culture (tr-TR) has assymetric mapping of
vowel i/I. ANTLR generates lexical error when caseSensitive=false
in Turkish culture.
 
Fix
===
Change CharScanner class as follows

Before fix:

// Override this method to get more specific case handling
public virtual char toLower(int c)
{
    return Char.ToLower(Convert.ToChar(c));
}

After fix:

// Override this method to get more specific case handling
public virtual char toLower(int c)
{
return Char.ToLower(Convert.ToChar(c), 
System.Globalization.CultureInfo.InvariantCulture);
}

Argument can be made that end-users override toLower() method.
However, I feel passing 

System.Globalization.CultureInfo.InvariantCulture

to case mapping function toLower() is better solution.


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list