Computing Thoughts
Math and the Current State of Coarse-Grained Parallelism in Python
by Bruce Eckel
March 18, 2008

Summary
One of my main objectives at Pycon 2008 was to hear about experiences regarding existing tools for parallelizing Python programs, and also to find out more about mathematical programming with Python.

I have a client who runs very compute-intensive and long-running simulations and we've been working on parallelizing the process using Python. In my explorations, I followed the common path of writing tools myself before discovering that others had already solved the problem (however, I now have the advantage of understanding the issues much better).

There were no official talks on parallelism at the conference, other than one that talked about using Amazon EC2, so I set up an open spaces discussion and there was a second one that happened later. (There were also a couple of talks involving stackless python, but that is a coroutine system that doesn't run on multiple processors).

Although there seems to be a fair amount of exploration in this arena, the consensus appears to be that the two current practical contenders are Parallel Python and Processing. (Please correct me if I got this wrong or missed something; I was doing more discussing than taking notes during the open spaces).

Another interesting possibility involves IPython, which I finally started to play with in earnest at the conference. This is a cross between the Python interpreter prompt and an IDE. What's very nice is that IPython does command completion and produces help. Basically it's a terrific way to explore Python and libraries. If you have setuptools installed (just run ez_setup.py), you can just say easy_install ipython at a command prompt to install it, then just run ipython. Try importing a library and then using the '?' and command completion to see what IPython does for you.

IPython1 is the next version of IPython, and claims to have "a powerful architecture for parallel computing," but it is apparently still under development and no one in the open spaces session had used it for serious development.

The upcoming version of Jython 2.5 (first alpha soon, final sometime before the end of the year but summer sounded like it could be a "maybe") doesn't have a global interpreter lock (GIL) so you can use Jython to utilize the JVM's true threading. The removal of the GIL has the possibility of producing side effects that we may not actually notice right away, but that the Jython team assures me can be fixed as they are discovered.

When asked, Jim Baker got that faraway look in his eyes and said that yes, Stackless Jython could probably be implemented, but it wasn't exactly clear what that would really mean.

Iron Python apparently also doesn't require a GIL. This doesn't help my particular situation because Linux machines are used in the cluster.

A number of people said they had very good experiences with Pyro, which is a Python distributed object system. This might also have possibilities for certain types of parallel solutions.

NumPy and SciPy

I came early to Pycon to take the NumPy and SciPy tutorials, given by Travis Oliphant and Eric Jones of Enthought, a company which supports and makes its living teaching and consulting about these open-source libraries.

You can install NumPy with easy_install numpy.

NumPy is basically about "arrays" where the term "array" includes multiple-dimension matrices as well. So you can, for example, do the classic "invert a matrix and multiply it by itself to produce the identity matrix" trick:

>>> from numpy import *
>>> A = mat([[1,2,4], [2,5,3], [7,8,9]])
>>> A.I # Matrix inversion
matrix([[-0.42857143, -0.28571429,  0.28571429],
        [-0.06122449,  0.3877551 , -0.10204082],
        [ 0.3877551 , -0.12244898, -0.02040816]])
>>> A * A.I # Matrix multiplication
matrix([[  1.00000000e+00,   5.55111512e-17,   1.38777878e-17],
        [  5.55111512e-17,   1.00000000e+00,   3.12250226e-17],
        [  3.33066907e-16,   1.52655666e-16,   1.00000000e+00]])

NumPy does a lot more (even Fourier Transforms), but the core is this very efficient array mechanism, which is much more compact that arrays in Java, so you can make them huge without worrying about running out of memory. There is also support for handing these data structures to other, non-Python routines, which is where SciPy comes in.

SciPy has all kinds of high end math functions, many of which use the long-optimized C and Fortran routines directly. For example, here's how to produce a Bessel function (the solution to the vibrating drumhead problem):

from scipy import special
x = special.r_[0:100:0.1]
j0x = special.j0(x)
print j0x

Talk Back!

Have an opinion? Readers have already posted 4 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.


	Web Artima.com