The Artima Developer Community
Sponsored Link

Python Buzz Forum
Encoding of non-ascii characters in URLs

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
Encoding of non-ascii characters in URLs Posted: May 24, 2006 12:43 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Encoding of non-ascii characters in URLs
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement

Today I've been subjecting the PeopleAggregator API implementation to the 'Sam Ruby Iñtërnâtiônàlizætiøn test'. It went in and out just fine through XML-RPC, but the REST methods caused a bit more trouble. All sorted out now, but...

It turns out that Firefox, at least on my dev machine, encodes URLs as ISO-8859-1 (or perhaps Windows-1252), whereas Internet Explorer encodes them as UTF-8. I was trying to use PHP's mb_convert_encoding function to convert this, but it was just ignoring any non-ASCII chars.

The interesting thing about non-ascii chars in URLs and POSTDATA is that the browsers don't seem to send any indication of the charset used. Whether the content is UTF-8 or ISO-8859-1, all I get is "Content-Type: application/x-www-form-urlencoded". It would be nice to have "; charset=UTF-8" at the end, but it doesn't seem like I'm that lucky!

As a results of this, I've reduced the scope - PeopleAggregator will support UTF-8 and ISO-8859-1, with UTF-8 strongly preferred.

For Frontier's benefit, it will handle XML-RPC requests that pretend to be UTF-8 but are actually ISO-8859-1.

Comment

Read: Encoding of non-ascii characters in URLs

Topic: links for 2006-05-13 Previous Topic   Next Topic Topic: links for 2006-05-12

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use