One of my main objectives at Pycon 2008 was to hear about experiences regarding existing tools for parallelizing Python programs, and also to find out more about mathematical programming with Python.
I have a client who runs very compute-intensive and long-running simulations and we've been working on parallelizing the process using Python. In my explorations, I followed the common path of writing tools myself before discovering that others had already solved the problem (however, I now have the advantage of understanding the issues much better).
There were no official talks on parallelism at the conference, other than one that talked about using Amazon EC2, so I set up an open spaces discussion and there was a second one that happened later. (There were also a couple of talks involving stackless python, but that is a coroutine system that doesn't run on multiple processors).
Although there seems to be a fair amount of exploration in this arena, the consensus appears to be that the two current practical contenders are Parallel Python and Processing. (Please correct me if I got this wrong or missed something; I was doing more discussing than taking notes during the open spaces).
Another interesting possibility involves IPython, which I finally started to play with in earnest at the conference. This is a cross between the Python interpreter prompt and an IDE. What's very nice is that IPython does command completion and produces help. Basically it's a terrific way to explore Python and libraries. If you have setuptools installed (just run ez_setup.py), you can just say easy_installipython at a command prompt to install it, then just run ipython. Try importing a library and then using the '?' and command completion to see what IPython does for you.
IPython1 is the next version of IPython, and claims to have "a powerful architecture for parallel computing," but it is apparently still under development and no one in the open spaces session had used it for serious development.
The upcoming version of Jython 2.5 (first alpha soon, final sometime before the end of the year but summer sounded like it could be a "maybe") doesn't have a global interpreter lock (GIL) so you can use Jython to utilize the JVM's true threading. The removal of the GIL has the possibility of producing side effects that we may not actually notice right away, but that the Jython team assures me can be fixed as they are discovered.
When asked, Jim Baker got that faraway look in his eyes and said that yes, Stackless Jython could probably be implemented, but it wasn't exactly clear what that would really mean.
Iron Python apparently also doesn't require a GIL. This doesn't help my particular situation because Linux machines are used in the cluster.
A number of people said they had very good experiences with Pyro, which is a Python distributed object system. This might also have possibilities for certain types of parallel solutions.
I came early to Pycon to take the NumPy and SciPy tutorials, given by Travis Oliphant and Eric Jones of Enthought, a company which supports and makes its living teaching and consulting about these open-source libraries.
You can install NumPy with easy_installnumpy.
NumPy is basically about "arrays" where the term "array" includes multiple-dimension matrices as well. So you can, for example, do the classic "invert a matrix and multiply it by itself to produce the identity matrix" trick:
NumPy does a lot more (even Fourier Transforms), but the core is this very efficient array mechanism, which is much more compact that arrays in Java, so you can make them huge without worrying about running out of memory. There is also support for handing these data structures to other, non-Python routines, which is where SciPy comes in.
SciPy has all kinds of high end math functions, many of which use the long-optimized C and Fortran routines directly. For example, here's how to produce a Bessel function (the solution to the vibrating drumhead problem):
from scipy import special
x = special.r_[0:100:0.1]
j0x = special.j0(x)
I'm sorry I missed your Open Spaces session, we only noticed the card on the board after it was over and really wanted to check it out.
"I have a client who runs very compute-intensive and long-running simulations and we've been working on parallelizing the process using Python."
This is something we run into a lot and have been trying tackle using a number of different approaches. You want the simplicity of numpy syntax, but the ability to distribute the computation across multiple cores/machines.
The takeaway for me is that using an MPI approach with numpy works, but requires stretching your brain a bit more than should be necessary for simple problems. For coarse-grained parallelism, I'm leaning more towards IPython1 and/or Hadoop now and will be writing some up tutorials/case studies for those approaches on my blog.
IPython1 will work well right now for running parallel simulations using a master/worker model... the project recently switched to Bazaar, but you can take a look at these monte carlo pricing examples on the old svn repo:
...though the learning curve is a little steep. But it's kind of hard to overstate how capable and powerful Twisted is when it comes to writing client/server/RPC sorts of apps, especially prototypes. Unlike other frameworks, I've never had any mixed feelings about the considerable time I've poured into learning it, simply because Twisted has enabled me to write apps that I couldn't have contemplated writing without it.
There are a few projects out there using Python in conjunction with MPI to build distributed simulation codes. One interesting project is pyMPI: http://pympi.sourceforge.net/ I used it with VTK (http://vtk.org) to visualize large datasets in parallel on clusters. The ParaView (http://paraview.org) Python bindings support client/server based computation where the server can run on a cluster over MPI. We ran paraview-python on clusters as well as supercomputers (BlueGene and Cray Xt3). Of course, most of these are thin Python wrappers around Fortran/C/C++ code. It is possible to build distributed numpy algorithms using pyMPI but I have little experience doing that.
I've used these on a couple of projects with good results. They also make use of a few of the existing scientific libraries for high-performance computing if available, and vector instructions as well.
http://code.google.com/p/pystream/ is a module that wraps the nVidia CUDA library for high-performance computing. CUDA offloads a lot of vector operations to the GPU in the video card, and nVidia is even deploying special cards just for compute work. It's pretty interesting, and I'll be experimenting with it directly over the next few weeks with some DSP analysis.