Sponsored Link •
Continuing the discussion of the GIL and parallel programming in Python.
In the comments to his posting, Guido van Rossum said:
But I've got a feeling that Bruce isn't thinking of this scenario when he asks for actors (which I remember him bringing up in 2001-2003, so at least he's consistent :-). Unfortunately I can't quite think what problem area he wants to address. There are many different ways one can use multiple CPUs to make a given algorithm faster, but it depends a lot on the algorithm how you have to code it to benefit. E.g. I believe that in the numpy world, GIL removal is pretty much a non-issue: all their heavy lifting is done by C, C++ or Fortran code, which can easily benefit from multiple CPUs by using special vectorizing operations or by creating OS-level threads that aren't constrained by the GIL (since they don't touch Python objects, only arrays of numbers).
My mistake is in introducing too many concepts at once.
First, I want to be able to easily use multiple CPUs to solve parallelization problems, without leaving Python. I don't know that I explicitly asked for GIL removal, and if I did I don't mean to. I don't want to specify the solution, just say what the problem is.
From what I've understood about the GIL it would not only be a huge task to remove it, but I've probably moved over to the camp of saying "don't remove it," especially after seeing Parallel Python (pp). Introducing true threads to the language would probably cause more problems than it solves, especially because it would introduce subtle programming problems that the GIL now prevents. Basically, it keeps you from cutting yourself by preventing a lot of collisions that you would get with true threading.
Now, I've heard all the arguments before, how we really need true shared memory and all that. But every time I drill into such arguments it turns out that the person has learned threading and that's their entire world view -- that's all they can imagine. So that's what they want, and the concern about not having threads is superstition. The same kind of superstition that makes them believe they can write bug-free concurrent programs.
Before you hit the 'reply' button, let's just assume that you're one of the elite who can actually do this -- we're talking about all those OTHER programmers out there who aren't as smart as you, and whose code you'll have to fix if they aren't put into a straightjacket so they can't do any damage.
If I can ever get Brian Goetz to do it, or to give me the list, I'll write about how we got where we are in thinking that we simply must have threads.
So before you hit the 'reply' button, imagine this scenario: you have a computer with 64 or 128 cores, and you want to use those cores. If you allow those OTHER programmers to write with threads, they're going to muck it up; even the best ones will have little race conditions they can't find. But with 64 or 128 or 1024 cores, those race conditions will show up as bugs for the end user and you're going to have to either figure out how to fix it, or give up.
Now, wouldn't you rather have a system where people can blindly write concurrent programs and not worry about guarding that shared memory? If you have all those cores, do you REALLY need to worry about performance so much that you have to do that dangerous memory sharing? Why not let the OS protect all those OTHER programmers from the problem of shared memory -- the the OS guarantee that there isn't any, and eliminate the problem.
You're going to say you really, really need shared memory threads. But you won't be able to keep those OTHER programmers from messing it up. Someday you'll thank me.
The reason for asking for Agents is definitely icing on the cake, but I think that it's icing that will make us much more productive with all those cores. Agents make it easier to think about concurrency problems because they're an object-way to think about concurrent programming. Again, Agents will make it easer for those OTHER programmers to write correct concurrent programs.
Introducing Agents, as shown in Scala, can be done with a library as long as the language has the basic support built in (well, there are even Agent packages for Java, but the right language support can make the use of Agents much more pleasant that you typically expect in Java). So it's not so big of a risk as it would be to ask for complete support for Agents directly in the language.
So to summarize, I want support for multiple cores, but not true threading -- I want process support. And it looks like pp or a similar design will accomplish that. And secondly, I'd like to raise the level of abstraction for concurrent programming via Agent support.
The first is essential, but the second would be Pythonic.
|Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.|