This post originated from an RSS feed registered with .NET Buzz
by Scott Hanselman.
Original Post: Internationalization/I18n: Char.IsDigit() matches more than just "0" through "9"
Feed Title: Scott Hanselman's ComputerZen.com
Feed URL: http://radio-weblogs.com/0106747/rss.xml
Feed Description: Scott Hanselman's ComputerZen.com is a .NET/WebServices/XML Weblog. I offer details of obscurities (internals of ASP.NET, WebServices, XML, etc) and best practices from real world scenarios.
Raymond
Chen just gave me a "duh" moment, by pointing out the obvious-only-if-you-think-about-it.
Char.IsDigit() doesn't mean 'IsZeroToNineInEnglish', it means 'is in the
decimal range of 0 to 9' and darnnit if there aren't other ways (other
than 0,1,2,3,4,5,6,7,8,9) to express them! :)
So let's run an experiment.
class Program { public static void Main(string[] args) { System.Console.WriteLine(
System.Text.RegularExpressions.Regex.Match( "\x0661\x0662\x0663", // "١٢٣"
"^\\d+$").Success); System.Console.WriteLine( System.Char.IsDigit('\x0661')); } }
The characters in the string are Arabic digits, but they are still digits, as
evidenced by the program output:
True True
Uh-oh. Do
you have this bug in your parameter validation? (More
examples..)
If you use a pattern like @"^\d$" to validate that you receive only digits,
and then later use System.Int32.Parse() to parse it, then I can hand
you some Arabic digits and sit back and watch the fireworks. The Arabic digits will
pass your validation expression, but when you get around to using it, boom, you throw
a System.FormatException and die. [Raymond
Chen]
Arabic speakers (مرحبًا, كيف حالك ؟ and forgive me, it's been college since I studied
Arabic) how to you handle numeric validation in JavaScript AND
guarantee that the JavaScript you use on the client-side is semantically
equivalent to the server-side code?
Either way, my friends, read, grok, and be enlightened. Muy interesante.