First, I bring up these issues because I want my favorite language to be better, not because I am criticizing Guido or his process. I've worked with a number of languages in some depth and the job done by Guido and the top-level committers is better than any I've seen -- and I've personally spent time with these folks. What's amazing about Python is that it is not a one-person operation; there's a real team that is making decisions, and Guido pushes down those decisions so that he doesn't make them personally if he doesn't have to (which, I think, would be too exhausting, for one thing).
Guido's continuing drumbeat of "contribute, contribute" is absolutely essential. And although contributing PEPs and code is contribution in the most real sense -- which I hope to do myself someday -- I think that creating conversations is also an important contribution.
Some of the requests I made are certainly unreasonable or impractical, but I think we need to be careful about issues that become superstition: "we talked about that once and it didn't work so that topic is done," "maybe that would be better but it's too much trouble" or (my favorite) "that couldn't possibly be efficient enough to be justified."
And the other issues I can live with, but when it comes to concurrency, the world is changing and we can't leave our heads in the sand about it, especially considering that multiple CPUs could obliterate the "Python is slow" argument.
In particular (this is to everyone, not just Guido), be careful when assuming that threads are the right solution. We came to threads through a series of steps, like the temperature being turned up on a frog in a pan of water. People assume that you "must have threads to do concurrency properly." But threads are fraught with problems and notoriously difficult -- some experts even say impossible -- to get right (hey, the GIL might be your friend a lot more than you know). Yes, with processes you don't get everything you get with threads, but you can use multicores and multiple machines right now and write robust code because the OS is protecting you by not allowing you to share memory. That's a good thing! As far as overhead goes, I hope that we may start to see cleverer solutions -- in the same vein as ctypes solves its problem optimally -- as the issues become clearer. But for now, pretend that any expensive problem can be distributed to as many CPUs as you want, because from that perspective Python begins to look like the most effective language on all counts: if you can write it quickly and you can use distribution to make it run fast enough, then Python becomes the cheapest solution.
>...multiple CPUs could obliterate the "Python is slow" argument.
You seem to be implying that the primary reason for Python's slowness is lack of support for multiple processors. I'm not sure I understand how you come to that conclusion--if that is what you're saying. I'm no expert, but the problems with Python's slowness are considerably more pervasive than lack of support for multiple processors.
> >...multiple CPUs could obliterate the "Python is slow" > argument. > > You seem to be implying that the primary reason for > Python's slowness is lack of support for multiple > processors. I'm not sure I understand how you come to that
I agree. C is slow, C++ is slow, Java is slow, but how many applications are written in hand tuned assembly today? Unless you say Python is too slow to do X on Y hardware doing it in a Z particular way, saying "Python is slow" is a lame excuse that should be ignored. There is no blanket solution to improve performance.
> > I'm no expert, > > but the problems with Python's slowness are > considerably > > more pervasive than lack of support for multiple > > processors. > > I thought it was interesting that in another one of these > threads there was benchmark showing that Jython was > substantially faster (> 2.5X) than CPython when both were > single-threaded. > > http://blogs.warwick.ac.uk/dwatkins/entry/benchmarking_parallel_python_1_2/ >
I ran the benchmarks on my machine (Windows XP, Pentium(R) D 2.8GHZ, 1GB, Jython 2.2, Python 2.5, Java 1.6.0_02 and also IronPython 1.1) and the results I obtained were far from the ones Daniel published. His setup was different than mine (including OS), so I wonder if anyone was able to reproduce his results. His scripts (using one worker) showed results compatible with pystone: IronPython performed better, followed by CPython. Jython was last, running almost twice as slow as CPython (instead of being being 2.5x faster).
> I ran the benchmarks on my machine (Windows XP, Pentium(R) > D 2.8GHZ, 1GB, Jython 2.2, Python 2.5, Java 1.6.0_02 and > also IronPython 1.1) and the results I obtained were far > from the ones Daniel published. His setup was different > than mine (including OS), so I wonder if anyone was able > to reproduce his results. His scripts (using one worker) > showed results compatible with pystone: IronPython > performed better, followed by CPython. Jython was last, > running almost twice as slow as CPython (instead of being > being 2.5x faster).
You results are more along the lines of what I would expect. But just for kicks, do you have time to download Jython 2.3 and rerun? I don't think the Jython version was specified in the post.
> saying "Python is slow" is a lame excuse that should be > ignored. There is no blanket solution to > improve performance.
Except optimizing code by hand.
People think parallelism is a kind of magic wand. Just add some more threads and everything will run 73.5% faster. No, that's WRONG. Parallelism takes substantial work to be done properly. This is especially true for non-Neumann architectures, such as threading model.
There is the age old discussion on threads happening, but threads, as in pthreads etc are of course not really threads in the true sense. If you create two pthreads on a single core single-threaded processor you have only one thread processing at one time.
My laptop has two cores (aka two threads in the real sense). The minimum spec of machines at my company are now 4CPU sparcIV+ (8 threads) and many of our apps are hitting 8 or 12 CPU's in a minimum configuration (that means someone somewhere is not getting the response times they expect), all to simply to handle thousands of queries on Terabytes of data, there's no rocket science, just pure volume. Ignoring Intel and AMD, take a look at the sparc T1, a low wattage 8 core CPU, it's a wonderful piece of kit, power efficient and so on. It's obviously successful and so Sun are releasing the T2 which could have up to 32 cores (real threads).
It's just not a debate that we must go parallel to keep up, but threads (ala pthreads) and GIL are just not the solution, they are only one mechanism which is really lightweight processes.
At my firm, our most critical apps are now going multi-host on commodity hardware. This is only feasible with decent middleware support, caching, efficient data transports, networking and so on. What we need are easy to use distributed models. By implementing some of these libraries, there may (or not) be changes to the core language that could ease and support these strategies.
Read the google white paper on map/reduce if you have't done so already and look for the java based nutch project.
My feeling is not to focus the core of Python on removing GIL and better support for pthreads (as that route is essentially a dead-mans-shoes scenario, waiting for CPU vendors to give more cores), but, to take a leaf out of the books of languages such as Erlang, look at openMPI or the older PVM. There are concurrent versions of Haskell and Lisp (multi-lisp I think).
Python is rock-solid and stable in it's core and Guido is right in the sense that there seem to be no tangible gains from threads (pthreads), because the real gains in concurrency come from the strategies employed solve the particular tasks at hand.
Oracle could be described as a mildly successful database company and their major strategy (whilst they may employ pthreads beneath the covers on a per process basis) is to have multiple, long running processes.
Good points both sides, that is what makes a discussion lively, anyway ;o)
A) First of all, the question is not whether py is slow or fast but performance of a system written in py. Which means, ability to leverage multi-core architectures as well as control. Control in term of things like ability to pin one process/task to a core, ability to pin one or more homogeneous tasks to specific cores et al, as well as not wait for a global lock and similar primitives. (Before anybody jumps into a conclusion, this is not about GIL by any means ;o))
B) Second, it is clear that we need a good solution (not THE solution) for moderately massive parallelism in multi-core architectures (i.e 8-32 cores). Share nothing might not be optimal; we need some form of memory sharing, not just copy all data via messages. May be functional programming based on the blackboard pattern would work, who knows.
I have seen systems saturated still having only ~25% of CPU utilization (in a 4 core system!). It is because we didn't leverage multi-cores and parallelism. So while py3k will not be slow, lack of a cohesive multi-core strategy will show up in system performance and byte us later(pun intended!).
C) As Guido and Bruce (and others) echo, this is a call for participative action - conversation is a, excellent start, let us extend the conversation to a PEP addressing the multi-core parallelism in python and an implementation there of. The good news is there are at least 2 or 3 paradigms with implementations and rough benchmarks.
D) While Guido is almost right in saying that this is a (std)library problem, it is not fully so. We would need a few primitives from the underlying PVM substrate. Possibly one reason for Guido's position is the lack of clarity as to what needs to be changed and why. IMHO, just saying take GIL off does not solve the problem.
E) And Guido is right in insisting on speed, and Bruce is right in asking for language constructs. Without pragmatic speed, folks won't use it; same is the case without the required constructs. Both are barriers to adoption. We have an opportunity to offer a solution for multi-core architectures and let us seize it - we will rush in where angels fear to tred!
F) What would verybody suggest ? A PEP on support for multi-core parallelism or should we de-scope the PEP to (say) Declarative multi-core for web applications and apply our creativity and ingenuity into that one domain and constrain the problem ?
P.S ; Bruce, you had written "Actually it was not purely via library constructs; there was a very important low-level change made to ensure cache coherency based on a seminal paper that came out in recent years and changed everyone's thinking about the issue (I don't have the reference, but Scott Meyers wrote about it, and I'm pretty sure Brian Goetz has as well)" Can you help me find this paper ?
> P.S ; Bruce, you had written "Actually it was not purely > via library constructs; there was a very important > low-level change made to ensure cache coherency based on a > seminal paper that came out in recent years and changed > everyone's thinking about the issue (I don't have the > reference, but Scott Meyers wrote about it, and I'm pretty > sure Brian Goetz has as well)" Can you help me find this > paper ?