Removing the hype around the multicore (non) revolution and some (hopefully) sensible comment about threads ad other forms of concurrency.
Threads, processes and concurrency in Python: some thoughts
I attended the EuroPython conference in Birmingham last week. Nice
place and nice meeting overall. There were lots of interesting talks
on many subjects. I want to focus on the talks about concurrency here.
We had a keynote by Russel Winder about the "multicore
revolution" and various talks about different approaches to
concurrency (Python-CSP, Twisted, stackless, etc). Since this is a hot
topic in Python (and in other languages) and everybody wants to have
his saying, I will take the occasion to make a comment.
First of all, I want to say that I believe in the multicore non
revolution: I claim that essentially nothing will change for the average
programmer with the advent of multicore machines. Actually, the
multicore machines are already here and you can already see that
nothing has changed.
For instance, I am interacting with my database just as before: yes,
internally the database may have support for multiple cores, it may be
able to perform parallel restore and other neat tricks, but as a
programmer I do not see any difference in my day to day SQL
programming, except (hopefully) on the performance side.
I am also writing my web application as before the revolution: perhaps
internally my web server is using processes and not threads, but I do
not see any difference at the web framework user level. Ditto if I am
writing a desktop application: the GUI framework provides a way to
launch processes or threads in the background: I just perform
the high level calls and I not fiddle with locks.
At work we have a Linux cluster with hundreds of CPUs, running
thousands of processes per day in parallel: still, all of the
complication of scheduling and load balancing is managed by the Grid
engine, and what we write is just single threaded code interacting with
a database. The multicore revolution did not change anything for the
way we code. On the other extreme of the spectrum, people developing
for embedded platforms will just keep using platform-specific
The only programmers that (perhaps) may see a difference are
scientific programmers, or people writing games, but they are a
minority of the programmers out there. Besides, they already know
how to write parallel programs, since in the scientific community
people have discussed parallelization for thirty years, so no
revolution for them either.
For the rest of the world I expect that frameworks will appear
abstracting the implementation details away, so that people will not
see big differences when using processes and when using threads. This
is already happening in the Python world: for instance the
multiprocessing module in the standard library is modeled on the
threading module API, and the recently accepted PEP 3148 (the one
about futures) works in the same way for both threads and processes.
At the conference there was a lot of bias against threads, as usual in
the Python world, just more so. I have heard people saying bad things
against threads from my first day with Python, 8 years ago, and
frankly I am getting tired. It seems this is an area filled with
misinformation and FUD. And I am not even talking of the endless rants
against the GIL.
I do not like threads particularly, but after 8 years of hearing
things like "it is impossible to get threads right, and if you are
thinking so you are a delusional programmer" one gets a bit tired. Of
course it is possible to get threads right, because all mainstream
operating systems use them, most web servers use them, and thousands
of applications use them, and they are all working (I will not claim
that they are all bug-free, though).
The problem is that the people bashing threads are typically system
programmers which have in mind use cases that the typical application
programmer will never encounter in her life. For instance, I recommend
the article by Bryan Cantrill "A spoon of sewage", published in the
Beautiful Code book: it is an horror story about the intricacies of
locking in the core of the Solaris operating system (you can find part
of the article in this blog post). That kind of things are terribly
tricky to get right indeed; my point however is that really few people
have to deal with that level of sophistication.
In 99% of the use cases an application programmer is likely to run
into, the simple pattern of spawning a bunch of independent threads
and collecting the results in a queue is everything one needs to
know. There are no explicit locks involved and it is definitively
possible to get it right. One may actually argue that this is a case
that should be managed with a higher level abstraction than threads: a
witty writer could even say that the one case when you can get threads
right is when you do not need then. I have no issues with that
position: but I have issue with the bold claim that threads are impossible
to use in all situations!
In my experience even the trivial use cases are rare and actually in 8
years of Python programming I have never once needed to implemenent a
hairy use case. Even more: I never needed to perform a concurrent
update using locks directly (except for learning purposes). I do
write concurrent applications, but all of my concurrency needs are
taken care of by the database and the web framework. I use
threadlocal objects occasionally, to make sure everything works
properly, but that's all. Of course threadlocal objects (I mean
instances of threading.local in Python) use locks internally, but
I do not need to think about the locks, they are hidden from my user
experience. Similarly, when I use SQLAlchemy, the thread-related
complications are taken care of by the framework. This is why in
practice threads are usable and are actually used by everybody,
sometimes even without knowing it (did you know that using the
standard library logging module turns your program into a
multi-threaded program behind your back?).
There is more to say about threads: if you want to run your
concurrent/parallel application on Windows or in any platform lacking
fork, you have no other choice. Yes, in theory one could use the
asynchronous approach (Twisted-docet) but in practice even Twisted use
threads underneath to manage blocking input (say from the database):
there is not way out.
At the conference various people conflated parallelism with
concurrency, and I feel compelled to rectify that misunderstanding.
Parallelism is really quite trivial: you just split a computation in
many independent tasks which interact very little or do not
interact at all (for the so-called embarrassing parallel problems) and
you collect the results at the end. The MapReduce pattern of Google
fame is a well known example of simple parallelism.
Concurrency is very much nontrivial instead: it is all about modifying
things from different threads/processes/tasklets/whatever without
incurring in hairy bugs. Concurrent updates are the key aspects in
concurrency. A true example of concurrency is an OS-level task
The nice thing is that most people don't need true concurrency, they
need just parallelism of the simplest strain. Of course one needs a
mechanism to start/stop/resume/kill tasks, and a way to wait for a
task to finish, but this is quite simple to implement if the tasks are
independent. Heck, even my own plac module is enough to manage simple
parallelism! (more on that later)
I also believe people have been unfair against the poor old shared memory
model, looking only at its faults and not at its advantages. Most of
the problems are with locks, not with the shared memory model. In
particular, in parallel situations (say read-only situations, with no
need for locks) shared memory is quite good since you have access to
Moreover, the shared memory model has the non-negligible advantage
that you can pass non-pickleable objects between tasks. This is quite
convenient, as I often use non-pickleable objects such as generators
and closures in my programs (and tracebacks are unpickleable too).
Even if you need to manage true concurrency with shared memory, you
are not forced to use threads and locks directly. For instance, there
is a nice example of concurrency in Haskell in the Beautiful Code
book titled "Beautiful concurrency" (the PDF is public) which uses
Software Transactional Memory (STM). The same example can be
implemented in Python in a completely different way by using
cooperative multitasking (i.e. generators and a scheduler) as
documented in a nice blog post by Christian Wyglendowski. However:
the asynchronous approach is single-core;
if a single generator takes too long to run, the whole program will block,
so that extra-care should be taken to ensure cooperation.
Recently I have released a module named plac which
started out as a command-line argument parser but immediately evolved
as a tool to write command-line interpreters. Since I wanted to be
able to execute long running commands without blocking the interpreter
loop I implemented some support for running commands in the background
by using threads or processes. That made me rethink about various
things I have learned about concurrency in the last 8 years: it
also gave me the occasion to implement something non completely
trivial with the multiprocessing module.
In plac commands are implemented as generators
wrapped in task objects. When the command raises an exception, plac
catches it and stores it in three attributes of the task object:
etype (the exception class), exc (the
exception object) and tb (the exception traceback). When working
in threaded mode it is possible to re-raise the exception after the
failure of task, with the original traceback. This is convenient
if you are collecting the output of different commands, since you
can process the error later on.
In multiprocessing mode instead, since the exception happened in a
separated process and the traceback is not pickleable, it is
impossible to get your hands on the traceback. As a workaround plac
is able to store the string representation of the traceback, but it is
clearly losing debugging power.
Moreover, plac is based on generators
which are not pickleable, so it is difficult to port on Windows
the current multiprocessing implementation, whereas the threaded
implementation works fine both on Windows and Unices.
Another difference worth to notice is that the
multiprocessing model forced me to specify explicitly which variables
are shared amongst processes; as a consequence, the multiprocessing
implementation of tasks in plac is slightly longer than the threaded
implementation. In particular, I needed to implement the shared attributes as
properties over a multiprocessing.Namespace object. However, I
must admit that I like to be forced to specify the shared
variables (explicit is better than implicit).
I am not touching here the issue of the overhead due to processes and
process intercommunication, since I am not interested in performance
issues, but there is certainly an issue if you need to pass a large
amount of data so certainly there are cases where using threads has
Still, at EuroPython it seemed that everybody was dead set against
threads. This is a feeling which is quite common amongsts Python
developers (actually I am not a thread lover myself) but sometime
things get too unbalanced. There is so much talk
against threads and then if you look at the reality it turns out that
essentially all Web frameworks and database libraries are using them!
Of course, there are exceptions, like Twisted and Tornado, or psycopg2
which is able to access the asynchronous features of PostgreSQL, but
they are exactly that: exceptions. Let's be honest.
In practice it is difficult to get rid of threads and no amount of
thread bashing will have any effect. It is best to have a positive
attitude and to focus on ways to make threads easier to use for the
simple cases, and to provide thread/process agnostic high level APIs:
PEP 3148 is a step in that direction. For instance, an application
could use use threads on Windows and processes on Unices,
transparently (at least to a certain extent: it is impossible to be
perfectly transparent in the general case).
In the long run I assume that Windows will grow some good way to run
processes, because it looks like it is tecnologically impossible to
substain the shared memory model when the number of cores becomes
large, so that the multiprocessing model will win at the end. Then
there will be less reasons to complain about the GIL. Not that
there aren many reason to complain even now, since the GIL affects
CPU-dominated applications, and typically CPU-dominated applications
such as computations are not done in pure Python, but in C-extensions
which can release the GIL as they like. BTW, the GIL itself will never go
away in C-Python because of backward compatibility concerns with
C-extensions, even if it will improve in Python 3.2.
So, what are my predictions for the future? That concurrency will be
even further hidden from the application programmer and that the
underlying mechanism used by the language will matter even less than
it matters today. This is hardly a deep prediction; it is already
happening. Look at the new languages: Clojure or Scala are using Java
threads internally, but the concurrency model exposed to the
programmer is quite different. At the moment I would say that all
modern languages (including Python) are converging towards some form
of message passing concurrency model (remember the Go meme don't
communicate by sharing memory; share memory by communicating). The
future will tell if the synchronous message passing mechanism
(CSP-like) will dominate, or if the Erlang-style asynchronous message
passing will win, or if they will coexist (which looks likely).
Event-loop based programming will continue to work fine as always and
raw threads will be only for people implementing operating
systems. Actually I should probably remove the future tense since a
lot of people are already working in this scenario.
I leave further comments to my readers.
UPDATE: I see today a very interesting (as always!) article by Dave Beazley
on the subject of threads and generators. He suggests cooperation between
threads and generators instead of just replacing threads with generators.
Kind of interesting to me, since plac uses the same trick of wrapping
a generator inside a thread, even if for different reasons (I am just
interested in making the thread killable, Dave is interested on performance).
BTW, all articles by Dave are a must read if you are interested in concurrency
in Python, do a favor to yourself and read them!
-- For instance, I am interacting with my database just as before: yes, internally the database may have support for multiple cores, it may be able to perform parallel restore and other neat tricks, but as a programmer I do not see any difference in my day to day SQL programming, except (hopefully) on the performance side.
Which is the main reason I've been arguing for some time that the move to "distribute" logic to the client (back to the COBOL/VSAM 60'S future, that was) is over. Single task apps will run slower as cores/processors multiply at stagnant clocks, and the multi-machines running RDBMS will have all the horsepower needed to do ACID, just as Dr. Codd said we should.
About "the one case when you can get threads right is when you do not need then": I sometimes use threads in Gambit-C scheme; they are continuations-based threads, so their usage doesn't use multi-core. Anyway, I find I can write more readable code this way, than, for example using an event-loop/reactor. But I think it's a subjective matter.
OT: Glad to hear that plac evolved in a command-line interpreter. Are you still sure that Domain Specific Languages aren't important for a real-world programmer? .)
I imagine this argument can be rephrased differently: threads are not the abstraction for concurrency. They give too much power that is not needed most of the time, and this power allows for more bugs that can be avoided by using the right abstraction. Given that concurrency is not equivalent to threads, the prediction can also be rephrased: application programmers will be more aware of concurrency. Rather than having concurrency as a tagged on library, threads or otherwise, it will be part of the language proper. Erlang, Clojure and Scala are examples where concurrency is more prominant.
> The multicore > non revolution > First of all, I want to say that I believe in the > multicore non > revolution: I claim that essentially nothing will > change for the average > programmer with the advent of multicore machines. > Actually, the > multicore machines are already here and you can already > see that > nothing has changed.
Depends what you mean by an "average programmer". For an average system programmer or a game programmer things have changed dramatically.
> Depends what you mean by an "average programmer". For an > average system programmer or a game programmer things have > changed dramatically.
If you take the view that "average" or "median" programming tasks is determined across all programmers, then the article is quite correct. Of all programmers, system programmers (using X86 architectures, which is where this article is based) may be 1%; those writing windoze and those writing linux. One may view server writers, database and web mostly, as system programmers. That might get us to 2%. Gamers, can't say, but last I looked, most code to rendering engines, not compilers, much the way web coders write to some java framework, not the compiler.
Amdahl's law has not been repealed, nor will it be. Unless, and until, it is the usefulness of parallel code is minimal outside of server development.
> Are you still sure that Domain Specific > Languages aren't important for a real-world programmer? .)
I don't remember having bashed DSLs. For sure I have written that in my opinion an enterprise-oriented language should not have macros, but I never said that an enterprise-oriented language should have no way to implement a DSL external to the language. Actually DSLs are quite useful, especially the command-oriented ones, but you don't need macros to implement them.
> Of all programmers, > system programmers (using X86 architectures, which is > where this article is based) may be 1%; those writing > windoze and those writing linux. One may view server > writers, database and web mostly, as system programmers. > That might get us to 2%.
> > Of all programmers, > > system programmers (using X86 architectures, which is > > where this article is based) may be 1%; those writing > > windoze and those writing linux. One may view server > > writers, database and web mostly, as system > programmers. > > That might get us to 2%. > > Interesting numbers. May I ask for the source?
I thought it was pretty clear: I made 'em up. OTOH, if the entirety of M$ and linux kernel programmers reach even 1%, I'd be shocked. There just aren't that many of 'em.
The linux kernel mailing list is a few thousand; call it 3,000. Say M$ has 5,000 devoted to winX development. Total here of 8,000. Double that to cover database engines and web engines. (Aside: I knew the core developers of the Progress database engine, and it was about two dozen.)
That gets us to 16,000. So, we need 1,600,000 total X86 programmers to make my SWAG. I don't see any problem with that.
"First of all, I want to say that I believe in the multicore non revolution: I claim that essentially nothing will change for the average programmer with the advent of multicore machines. Actually, the multicore machines are already here and you can already see that nothing has changed."
Yes - I don't see any change. Yet.
However,I'm thinking that in 5 years or so multicores will be the norm. Your bigbox store may sell machines with 16+ cores and, perhaps 128gig+ of memory.
More importantly, the average programmer will have access to parallel programming libraries like the stuff coming out of M$ for .NET. There's likely a spate of such libraries for Java and in the open source space as well (although none come to mind!)
I think it is a matter of teaching and training. The world is parallel. The industry needs to move toward strategies to teach students and aged fogs like myself how to think about exploiting parallelism.
I remember when Objects hit the corporate world (at least my corner of it). There were folk saying that the average programmer could not understand objects and could not effectively develop good object-based applications. Yes, it's an open question as to whether or not the average programmer 'understands' objects but it's a fact that corporate America's IT shops are using object technologies - maybe poorly - to develop mision-critical apps.
I expect the same thing to happen with parallel programming in the near future.
Flat View: This topic has 15 replies
on 2 pages