The Artima Developer Community
Sponsored Link

Python Buzz Forum
Python Package Index Greatest Hits

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Carlos de la Guardia

Posts: 219
Nickname: cguardia
Registered: Jan, 2006

Carlos de la Guardia is an independent web developer in Mexico
Python Package Index Greatest Hits Posted: Sep 17, 2007 1:11 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Carlos de la Guardia.
Original Post: Python Package Index Greatest Hits
Feed Title: I blog therefore I am
Feed URL: http://blog.delaguardia.com.mx/feed.atom
Feed Description: A space to put my thoughts into writing.
Latest Python Buzz Posts
Latest Python Buzz Posts by Carlos de la Guardia
Latest Posts From I blog therefore I am

Advertisement

I decided to create a wiki page about zope 3's most useful libraries, so I began to look into how to find out which ones are the most popular. Since the Zope 3 community encourages registration of libraries on the Python Package Index, that's where I began my search.

One quantitative way to define 'popular' is by measuring the number of downloads of a library. Presumably, popular libraries will be downloaded more often. The PyPI keeps track of downloads, so I thought that could be good enough to start my list.

Well, the PyPI has an XML-RPC API, but the number of downloads is not available in search results (or at least is not documented). To further complicate matters, package owners can hide old releases, which also will not show in the results. That is a problem, because when you release a new version of a package and hide all the old ones, the download page for the new release will show zero downloads, with no way of knowing which other releases have been made.

The first problem can be easily solved by doing a little screen-scraping; the second problem is harder to solve (I really didn't try), and basically means that any results I get by using the API have a huge question mark attached.

For my purposes though, the inexact results can be tolerated, since I'm only looking for some of Zope 3 most popular libraries for a documentation page, I'm not trying to create any kind of definitive list.

Anyway, I wrote a quick script and decided to test it first using the whole catalog, so without further ado, here's the list of PyPI's 50 greatest hits:

34261zc.buildout
28431simplejson
22887FormEncode
20852Pylons
18509lxml
16160ConfigObj
14835Routes
12770MyghtyUtils
11279Myghty
10147zope.interface
9994PasteDeploy
8539TurboCheetah
8456setuptools
8455zc.recipe.egg
6839zope.testing
6352kid
5937Cheetah
5614Mako
5584TurboJson
5435DecoratorTools
5405roundup
5327fpconst
53154Suite-XML
5214altgraph
4757modulegraph
4591macholib
4309SQLObject
3908zc.recipe.testrunner
3821SQLAlchemy
3793wsgiref
3441ZSI
3398pytz
3386ZODB3
3244zc.recipe.filestorage
3128Pygments
3116textile
3092Elixir
2891zope.deferredimport
2875WSGIUtils
2753py2app
2731AuthKit
2408buildutils
2233bdist_mpkg
2192zope.proxy
2175MySQL-python
2156readline
2143memojito
2011zope.component
1987zc.recipe.zope3instance
1954zope.exceptions

I already explained this, but let me point out one more time, that the top packages on this list are surely the ones that don't hide their old versions. Also, keep in mind that many packages have their own download locations and don't use PyPI for this.

For those interested, here's the code that generated this list (I used the BeautifulSoup screen scraping library):

import xmlrpclib
import urllib2
from urllib import quote
from BeautifulSoup import BeautifulSoup

server = xmlrpclib.Server('http://pypi.python.org/pypi')

spec={}
operator='and'

packages=[package['name'] for package in server.search(spec,operator)]

downloaded=[]
downloaded_names=[]

for package in packages:
downloads=0
package_releases=server.package_releases(package)
for release in package_releases:
try:
package_url='http://pypi.python.org/pypi/%s/%s' % (quote(package),release)
except KeyError:
continue
try:
text=urllib2.urlopen(package_url).read()
except urllib2.HTTPError:
continue
soup=BeautifulSoup(text)
for row in soup.findAll('tr')[1:-1]:
columns=row.findAll('td')
if len(columns)>=4:
downloads=downloads+int(columns[4].string)
if not package in downloaded_names:
downloaded_names.append(package)
downloaded.append({'package':package,'times':downloads})

top=sorted(downloaded,lambda x,y:cmp(y['times'],x['times']))

print " Most downloaded packages for spec %s " % str(spec)
for package in top:
print "%8d -->" % package['times'],
print package['package']

That's it for now. Next time I will give my attention to Zope 3's libraries (though you can see quite a few of them on the general Python list above). We'll see how that goes.


Read: Python Package Index Greatest Hits

Topic: Re: To mistake is human Previous Topic   Next Topic Topic: BangPypers mailing list moved to python.org !

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use