This post originated from an RSS feed registered with Python Buzz
by Carlos de la Guardia.
Original Post: Python Package Index Greatest Hits
Feed Title: I blog therefore I am
Feed URL: http://blog.delaguardia.com.mx/feed.atom
Feed Description: A space to put my thoughts into writing.
I decided to create a wiki page about zope 3's most useful libraries, so I began to look into how to find out which ones are the most popular. Since the Zope 3 community encourages registration of libraries on the Python Package Index, that's where I began my search.
One quantitative way to define 'popular' is by measuring the number of downloads of a library. Presumably, popular libraries will be downloaded more often. The PyPI keeps track of downloads, so I thought that could be good enough to start my list.
Well, the PyPI has an XML-RPC API, but the number of downloads is not available in search results (or at least is not documented). To further complicate matters, package owners can hide old releases, which also will not show in the results. That is a problem, because when you release a new version of a package and hide all the old ones, the download page for the new release will show zero downloads, with no way of knowing which other releases have been made.
The first problem can be easily solved by doing a little screen-scraping; the second problem is harder to solve (I really didn't try), and basically means that any results I get by using the API have a huge question mark attached.
For my purposes though, the inexact results can be tolerated, since I'm only looking for some of Zope 3 most popular libraries for a documentation page, I'm not trying to create any kind of definitive list.
Anyway, I wrote a quick script and decided to test it first using the whole catalog, so without further ado, here's the list of PyPI's 50 greatest hits:
I already explained this, but let me point out one more time, that the top packages on this list are surely the ones that don't hide their old versions. Also, keep in mind that many packages have their own download locations and don't use PyPI for this.
import xmlrpclib import urllib2 from urllib import quote from BeautifulSoup import BeautifulSoup
server = xmlrpclib.Server('http://pypi.python.org/pypi')
spec={} operator='and'
packages=[package['name'] for package in server.search(spec,operator)]
downloaded=[] downloaded_names=[]
for package in packages: downloads=0 package_releases=server.package_releases(package) for release in package_releases: try: package_url='http://pypi.python.org/pypi/%s/%s' % (quote(package),release) except KeyError: continue try: text=urllib2.urlopen(package_url).read() except urllib2.HTTPError: continue soup=BeautifulSoup(text) for row in soup.findAll('tr')[1:-1]: columns=row.findAll('td') if len(columns)>=4: downloads=downloads+int(columns[4].string) if not package in downloaded_names: downloaded_names.append(package) downloaded.append({'package':package,'times':downloads})
print "
Most downloaded packages for spec %s
" % str(spec) for package in top: print "%8d -->" % package['times'], print package['package']
That's it for now. Next time I will give my attention to Zope 3's libraries (though you can see quite a few of them on the general Python list above). We'll see how that goes.