This post originated from an RSS feed registered with Java Buzz
by Bill de hÓra.
Original Post: Free your mind
Feed Title: Bill de hÓra
Feed URL: http://www.dehora.net/journal/atom.xml
Feed Description: FD85 1117 1888 1681 7689 B5DF E696 885C 20D8 21F8
"Still, I believe that forcing programmers to consider encoding issues whenever they have to store some text is a very useful exercise, since otherwise - this is important - foreign language users may be completely unable to use your application. What is to you simply a question-mark or box where you expected to see an "é" * is, to billions of users the world over, a page full of binary puke where they expected to see a letter they just typed. Consider the other things that data - regular python 'str' objects - might represent. Image data, for example. If there were a culture of programmers that expected image data to always be unpacked 32-bit RGBA byte sequences, it would be very difficult to get the Internet off the ground; image formats like PNG and JPEG have to be decoded before they are useful image data, and it is very difficult to set a 'system default image format' and have them all magically decoded and encoded properly. If we did have sys.defaultimageformat, or sys.defaultaudiocodec, we'd end up with an upsetting amount of multi-color snow and shrieking noise on our computers." - Glyph Lefkowitz The comparison of text encodings to image encodings is a good one. One of the hard thing about explaining or understanding this stuff is that you have to stop believing your eyes when you see text on a computer screen - with a computer text is an illusion. * ed note: I had to hand convert this to UTF-8, since cut and paste would have resulted in binary puke...