This post originated from an RSS feed registered with Python Buzz
by Richard Jones.
Original Post: PyPI has had some much-needed attention
Feed Title: Richard's stuff
Feed URL: http://mechanicalcat.net/cgi-bin/log?flav=rss
Feed Description: Stuff I'm interested in
PyPI uses the very cool sqlite engine to store the database of package
information. Sqlite is cool because it's so simple to use and
self-contained. Unfortunately, it's not a multi-user database. This means
it locks the database when anyone accesses it. This caused PyPI some
problems because ... well, PyPI is much more popular than I'd
anticipated :)
After some brief analysis, I found:
The RSS feed gets hit about every 30 seconds or so (on average)
Some other PyPI page is hit at around the same frequency
About every third of those other hits is to the browse code, and the
browse code was slow - taking up to 30 seconds to complete a
request
Of course, this is all using averages, so during times of peak requests
(ie. lunchtime in the US ;) then the rates are higher. And the combination
of many requests and slow code result in users seeing "sorry, the database
is locked".
To remedy this, I've:
Cached the RSS feed, so it only rarely has to hit the database
Significantly improved the speed (and accuracy while I was at it) of
the browsing code