Python Buzz Forum - Python Package Index Greatest Hits

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
Python Package Index Greatest Hits

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Carlos de la Guardia

Posts: 219
Nickname: cguardia
Registered: Jan, 2006

Carlos de la Guardia is an independent web developer in Mexico

Python Package Index Greatest Hits

Posted: Sep 17, 2007 1:11 AM

This post originated from an RSS feed registered with Python Buzz by Carlos de la Guardia.
Original Post: Python Package Index Greatest Hits Feed Title: I blog therefore I am Feed URL: http://blog.delaguardia.com.mx/feed.atom Feed Description: A space to put my thoughts into writing.	Latest Python Buzz Posts Latest Python Buzz Posts by Carlos de la Guardia Latest Posts From I blog therefore I am

I decided to create a wiki page about zope 3's most useful libraries, so I began to look into how to find out which ones are the most popular. Since the Zope 3 community encourages registration of libraries on the Python Package Index, that's where I began my search.

One quantitative way to define 'popular' is by measuring the number of downloads of a library. Presumably, popular libraries will be downloaded more often. The PyPI keeps track of downloads, so I thought that could be good enough to start my list.

Well, the PyPI has an XML-RPC API, but the number of downloads is not available in search results (or at least is not documented). To further complicate matters, package owners can hide old releases, which also will not show in the results. That is a problem, because when you release a new version of a package and hide all the old ones, the download page for the new release will show zero downloads, with no way of knowing which other releases have been made.

The first problem can be easily solved by doing a little screen-scraping; the second problem is harder to solve (I really didn't try), and basically means that any results I get by using the API have a huge question mark attached.

For my purposes though, the inexact results can be tolerated, since I'm only looking for some of Zope 3 most popular libraries for a documentation page, I'm not trying to create any kind of definitive list.

Anyway, I wrote a quick script and decided to test it first using the whole catalog, so without further ado, here's the list of PyPI's 50 greatest hits:

34261	zc.buildout
28431	simplejson
22887	FormEncode
20852	Pylons
18509	lxml
16160	ConfigObj
14835	Routes
12770	MyghtyUtils
11279	Myghty
10147	zope.interface
9994	PasteDeploy
8539	TurboCheetah
8456	setuptools
8455	zc.recipe.egg
6839	zope.testing
6352	kid
5937	Cheetah
5614	Mako
5584	TurboJson
5435	DecoratorTools
5405	roundup
5327	fpconst
5315	4Suite-XML
5214	altgraph
4757	modulegraph
4591	macholib
4309	SQLObject
3908	zc.recipe.testrunner
3821	SQLAlchemy
3793	wsgiref
3441	ZSI
3398	pytz
3386	ZODB3
3244	zc.recipe.filestorage
3128	Pygments
3116	textile
3092	Elixir
2891	zope.deferredimport
2875	WSGIUtils
2753	py2app
2731	AuthKit
2408	buildutils
2233	bdist_mpkg
2192	zope.proxy
2175	MySQL-python
2156	readline
2143	memojito
2011	zope.component
1987	zc.recipe.zope3instance
1954	zope.exceptions

I already explained this, but let me point out one more time, that the top packages on this list are surely the ones that don't hide their old versions. Also, keep in mind that many packages have their own download locations and don't use PyPI for this.

For those interested, here's the code that generated this list (I used the BeautifulSoup screen scraping library):

import xmlrpclib
import urllib2
from urllib import quote
from BeautifulSoup import BeautifulSoup

server = xmlrpclib.Server('http://pypi.python.org/pypi')

spec={}
operator='and'

packages=[package['name'] for package in server.search(spec,operator)]

downloaded=[]
downloaded_names=[]

for package in packages:
    downloads=0
    package_releases=server.package_releases(package)
    for release in package_releases:
        try:
            package_url='http://pypi.python.org/pypi/%s/%s' % (quote(package),release)
        except KeyError:
            continue
        try:
            text=urllib2.urlopen(package_url).read()
        except urllib2.HTTPError:
            continue
        soup=BeautifulSoup(text)
        for row in soup.findAll('tr')[1:-1]:
            columns=row.findAll('td')
            if len(columns)>=4:
                downloads=downloads+int(columns[4].string)
    if not package in downloaded_names:
        downloaded_names.append(package)
        downloaded.append({'package':package,'times':downloads})

top=sorted(downloaded,lambda x,y:cmp(y['times'],x['times']))

print "
Most downloaded packages for spec %s
" % str(spec)
for package in top:
    print "%8d -->" % package['times'],
    print package['package']

That's it for now. Next time I will give my attention to Zope 3's libraries (though you can see quite a few of them on the general Python list above). We'll see how that goes.

Read: Python Package Index Greatest Hits

Previous Topic

Next Topic


	Web Artima.com