The Artima Developer Community
Sponsored Link

Weblogs Forum
It isn't Easy to Remove the GIL

54 replies on 4 pages. Most recent reply: Sep 12, 2008 5:12 PM by Patrick Stinson

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 54 replies on 4 pages [ « | 1 2 3 4 | » ]
Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 7:14 AM
Reply to this message Reply
Advertisement
> The envy is
> more from the fact that in so many other languages, even
> those that I personally don't like as much as Python, I
> can take advantage of multiple threads 'natively'.

What other mainstream languages besides Java and C++? Perl's threads are less functional than Python's, Tcl is mostly based around a single-threaded event model, Ruby has a GIL like Python.

Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:00 AM
Reply to this message Reply
> We have posted here some solutions that are
> indeed working right now and i desperately want to hear
> Guido's opinion on that...

I'm not familiar with any of the solutions proposed and it will take me some time to become sufficiently familiar with some of them before I can judge them. The last thing I want to do is rush something into the standard library without understanding what problem it solves, how stable it is, etc.

But (as Bruce seems to be understanding better than some of the commenters on his or this blog) before we rush to a solution we need to look more at the problem.

I'm not at all sure that the people crying for GIL removal and the people asking for more concurrency support are after the same thing at all. GIL removal is the "obvious" solution if you already have a multi-threaded program, like a web server: All web servers need some form of concurrency, and most solve it through threads, as it is the most convenient solution.

You can get quite far using the one-thread-per-request model (I think frameworks like Django and TurboGears/Pylons use this), since any individual web thread is typically I/O-bound: first you have to wait until the entire request is received, then you wait for your database, finally you wait until the client has received the last byte of your request. By the time your server is no longer I/O-bound but CPU-bound, you have likely hit upon a successful web concept, and the last thing you want to do is have to rethink everything in order to speed it up. So GIL removal sounds attractive. (It also helps that most databases already address the problem of concurrent access in one way or another, so this won't be a stumbling block.)

But I've got a feeling that Bruce isn't thinking of this scenario when he asks for actors (which I remember him bringing up in 2001-2003, so at least he's consistent :-). Unfortunately I can't quite think what problem area he wants to address. There are many different ways one can use multiple CPUs to make a given algorithm faster, but it depends a lot on the algorithm how you have to code it to benefit. E.g. I believe that in the numpy world, GIL removal is pretty much a non-issue: all their heavy lifting is done by C, C++ or Fortran code, which can easily benefit from multiple CPUs by using special vectorizing operations or by creating OS-level threads that aren't constrained by the GIL (since they don't touch Python objects, only arrays of numbers).

At Google we use a little hack called MapReduce, which tackles a different problem space again: algorithms involving massive amounts of data, so massive that they may not fit on a single computer's hard drive. Yet its abstraction is powerful enough to efficiently describe less resource-hungry algorithms as well -- as long as they are highly (embarrassingly) parallelizable.

Then there are models like Linda. Etc., etc. There are numerous other styles of concurrent problems as well, and I don't think a single solution can be made to fit all.

In CPython's specific case, I believe that an additional constraint is that it is often used as a *glue language*. This means that any solution must continue to have great support for writing extension modules in C/C++ and linking against libraries written in those languages. A solution that is to find widespread acceptance must continue to support OS-level threads and file descriptors in some way that makes it easy to bounce between the Python and C/C++ layers. Python (at least CPython) mustn't lose the ability to talk to databases, fork subprocesses and talk to them using pipes, link to GUI libraries that have their own event loop, or use system libraries that may have their own concurrency constraints.

One application at Google with which I'm rather intimately familiar uses the following forms of I/O, wrapped in various extension modules:

- sockets
- mysql
- Bigtable (an internal Google database)
- file system via NFS
- SSH to remote workstations
- Perforce API (a C++ library for which we have no source)

I'd like to continue to be able to use Python for such applications!

anthony boudouvas

Posts: 14
Nickname: objectref
Registered: Jun, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:23 AM
Reply to this message Reply
> I'm not familiar with any of the solutions proposed and it
> will take me some time to become sufficiently familiar
> with some of them before I can judge them. The last
> thing I want to do is rush something into the standard
> library without understanding what problem it solves, how
> stable it is, etc.

Ok, i completely understand this. But if you find the time, have a look at these proposals, i think that it will worth the effort.

> I'm not at all sure that the people crying for GIL removal
> and the people asking for more concurrency support are
> after the same thing at all.

As i said, i have started to work with one of these proposals we made here ("parallel python") and i see that it is working very-very nicely without being disturbed by the GIL. So, i think that if we have THAT luxury then there is no reason to spent time to remove the GIL.

I do not know of course your everyday work-schedule but there are modules out there that they worth your time to have a look at them. For example, i used psyco the other days and i was in complete shock when i figured out the performance improvements that it can gave us in some circumstances...

Juergen Brendel

Posts: 8
Nickname: jbrendel
Registered: Sep, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 11:27 AM
Reply to this message Reply
> > The envy is more from the fact that in so many other languages, even
> > those that I personally don't like as much as Python, I
> > can take advantage of multiple threads 'natively'.
>
> What other mainstream languages besides Java and C++?
> Perl's threads are less functional than Python's, Tcl is
> s mostly based around a single-threaded event model, Ruby
> has a GIL like Python.

Ah, yes, you got me there. I should have phrased this more carefully. Java and C/C++ are the languages I am most familiar with, outside of Python. So these are the languages I am mostly referring to.

However, I would say that most people who are or have been in system development are familiar with one of those languages, aren't they? And many then will have come across threading in those contexts.

That is why - in my opinion - Python could benefit so much from supporting the familiar threading API and multi-core hardware.

Consider also that in Python things mostly work as expected, which is one of the nicest features of the language and leads to comparisons with 'executable pseudo code' and all that. You just write it and it works as intended. The fact that threading does not quite work as one would expect breaks this philosophy.

There are people here on the mailing list who say: If you need performance, use another language, or write a C extension... Well, sadly having to resort to a 'hack' (which it is compared to the elegance of pure Python) does not speak well of Python. It's nice that it is possible to easily do that but it simply shouldn't be necessary.

In the end, though, after following the discussion here for a bit, I have to agree: This is really not about the GIL at all. It is about being able to take advantage of multiple CPUs easily and effectively. If at the end of the day multi-processing is hidden behind the scenes then this should be perfectly fine, as long as the API works the same on all the platforms, is as straight forward as the threading API, and if data can actually be shared without having to resort to message passing behind the scenes. The 'processing' module seems to be a good start in that direction from what I can see, but I have not tried it yet.

There has been a suggestion here in the discussion about an agent-based approach, so basically a different API rather than the threading API. I'm not so sure about that. Not so much for technical reasons, but for the above mentioned reasons of familiarity. The threading API is nice and simple, and is instantly understood by those coming from a C/C++ and Java world. I would think that for those reasons alone it would be a benefit to use that API.

And one last comment about the descriptions of threads as 'evil' or difficult or impossible to get right, which I have read here and in some other discussions. I think the 'evilness' of threads depends very much on the system you are working on. I have worked on multi-threaded systems of varying levels of complexity and it is very much possible to get them to work correctly. I never understood why some people have such an aversion against threads. In many cases, threads are a very natural way of expressing program logic. And unless your data structures are madly complex you should be able to get the locking right as well. It's definitely possible, and I simply don't buy the 'threads are wrong/evil/bad argument at all'.

Tom Poindexter

Posts: 1
Nickname: tpoind
Registered: Sep, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 2:47 PM
Reply to this message Reply
> What other mainstream languages besides Java and C++?
> Perl's threads are less functional than Python's, Tcl is
> s mostly based around a single-threaded event model, Ruby
> has a GIL like Python.

Tcl's event model goes a long way to provide the appearance of multi-threading without the hassle of threading issues, like the GIL. Even so, if your application absolutely needs threads, Tcl has thread support, and a fairly elegant solution at that.

Tcl's threading model is 'one interp per thread', and thus avoids global locks. Inter-thread communications is done by sending events (using each interperter's event queue) of some bit of Tcl code to be executed in the target thread. The Tcl interpreter has to be built with threading enabled. The performance penalty of running a single-threaded application in a threaded interpreter is about 5% (IIRC).

For reference, see: http://www.tcl.tk/doc/howto/thread_model.html
and http://wiki.tcl.tk/1339

Michel Alexandre Salim

Posts: 4
Nickname: salimma
Registered: Sep, 2005

Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:35 PM
Reply to this message Reply
> But I've got a feeling that Bruce isn't thinking of this
> scenario when he asks for actors (which I remember him
> bringing up in 2001-2003, so at least he's consistent :-).
> Unfortunately I can't quite think what problem area he
> wants to address.

http://lamp.epfl.ch/~phaller/doc/ActorsTutorial.html

As I undestand it, it allows concurrency with a lower overhead than multi-threading. Would be a nice feature to have in Python -- I wonder if it can be implemented as a library (the Scala implementation is).

By the way, Guido, the e-mail from Greg that you cited didn't claim a slowdown of 2x -- the slowdown was from 0.95 unit to 0.6, which is more like, what, 35%? Though Greg also noted that even at the time of writing, Python has started using more and more global structures which might mean an updated patch would perform worse.

John M Camara

Posts: 15
Nickname: camara
Registered: Dec, 2004

Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2007 7:25 PM
Reply to this message Reply
> I never understood why some
> people have such an aversion against threads.

Well, for some developers, threads are the only mechanism they use to solve concurrency issues so unfortunately they are often used and abused in the wrong places. I'm sure threading wouldn't get the bad rap they do by some if most developers would be willing to accept that threading is no golden solution to all their problems and learn a few additional approaches to solving concurrency issues.

I don't have any expectations that people will stop abusing the use of threads for all their problems. It's just as unlikely as trying to get most developers to realize that relational databases are not always the best storage solution. People can be real stubborn about learning new tricks.

Adam Olsen

Posts: 11
Nickname: rhamph
Registered: Jun, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2007 10:49 PM
Reply to this message Reply
I have been experimenting with GIL removal for a while now. I've avoided publicly posting to avoid the inevitable bikeshed discussions, but that seems pointless now, doesn't it? ;)

Before you consider performance you need to decide what sort of semantics you want. On my wishlist is easy spawning of threads, "sanely" handling exceptions, gracefully killing threads when the user doesn't handle an exception, and cheap sharing of read-only data. I have a lot of plans for how to do this, but the development schedule for 3.0 is already full, so I'm not going into them until 3.1 development starts.

The requirement to cheaply share read-only data eliminates process models. They may work for some tasks, but they're just too coarse for many things.

Now, on to the performance. I'm going to assume there's another scheme in place to prevent concurrent modification of dicts and the like (as my plans use), and that refcounts are the only issue. So, what are our options for safely handling reference counts?

Atomic refcounting via inline assembly is the most obvious solution. I've modified python to use these and found only 12% overhead to existing code. Sounds perfect, right? Wrong! It's 12% *uncontended*. Once you have two threads modifying the same reference count the costs skyrocket! Two threads becomes slower than a single thread. It only gets worse as you add more cores, so ultimately this is unviable.

Another option would be to use a traditional tracing GC. There's simply too much code in CPython to rewrite it all though. The Boehm-Demers-Weiser conservative GC *might* work, but I don't like how it might interact with arbitrary C libraries (nevermind if python is loaded after the rest of the app!) Also, python uses a lot of very short lived temporaries, so the memory costs could be very high with a tracing GC.

The third option, which I'm currently pursuing, is an odd sort of hybrid. The refcount for an object starts out "owned" by a single thread. So long as you are that thread, you can update the refcount directly (although the memory barriers and branches are still more expensive than traditional refcounting.) If you don't own the refcount, you ask the owner to promote the object to "asynchronous" refcounting. This means that all future refcount operations go through a hashtable and get flushed to memory in batches. Dropping to 0 is not checked until later when the tracing GC runs. I estimate this has a cost of around 30%, but the important thing is that (due to the batching) it's scalable! Switch from 4 core to 64 core and your performance could go up all the way.

Okay, back to reality. I am working on a patch/fork that includes the semantics on my wishlist and uses the hybrid refcount/tracing GC. When 3.1 development starts I plan to propose the semantics and (as a compile-time option!) the performance-affecting aspects. However, the cost so far is a lot more than 30%!

Unmodified py3k (revision 57858):
28000 pystones/second
Modified, single thread (dual core box):
11500 pystones/second
Modified, two threads (dual core box):
22800 pystones/second

I think the bulk of the difference is due to de-optimization. I've had to disable free lists, pymalloc, etc to get this far. Hopefully competitive replacements can be found in time.

Anyway, I do plan to maintain this patch/fork for a while (but I'm not posting until 3.0 is out the door! If you want to help, help 3.0!)

Jesse Noller

Posts: 3
Nickname: jnoller
Registered: Sep, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 13, 2007 5:03 PM
Reply to this message Reply
Hey Adam, I've started examining alternatives and benchmarking/mocking them up - I would love to talk to you more and maybe even try out your patches. If you're interested, shoot me an email at j-noller at gmail dot com

Adam Olsen

Posts: 11
Nickname: rhamph
Registered: Jun, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 14, 2007 4:53 PM
Reply to this message Reply
> Unmodified py3k (revision 57858):
> 28000 pystones/second
> Modified, single thread (dual core box):
> 11500 pystones/second
> Modified, two threads (dual core box):
> 22800 pystones/second

For those keeping score, my current numbers:
Unmodified py3k (revision 57858):
28000 pystones/second
Modified, single thread (dual core box):
18800 pystones/second
Modified, two threads (dual core box):
36700 pystones/second

My most recent change was to give each "asynchronous" object an index into a per-thread table of refcount-changes. This was precipitated on realizing my thread-pystones benchmark shares less than 500 objects. It's not clear what the costs would be to flush such a table when it becomes full, so it may not be as practical as a hash table.

Most of my performance improvements have come from tweaking INCREF/DECREF. I guess my earlier prediction that they were only 30% of the performance reduction was wrong. It's not clear how much of a hit glibc's malloc is.

Mike Sandman

Posts: 6
Nickname: n8behavior
Registered: Feb, 2004

Re: It isn't Easy to Remove the GIL Posted: Sep 14, 2007 5:21 PM
Reply to this message Reply
> Does that mean that jython and ironpython are "better" in
> this regard?

has it been mentioned (did i miss it?) if jython or IP scale fine on multi-core systems. the obvious answer would seem to be yes, since this is an issue of the VM, not the language.

Guy Kloss

Posts: 2
Nickname: xemacs
Registered: Sep, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 15, 2007 2:37 PM
Reply to this message Reply
This also sounds quite interesting on that account using Transactional Memory. Doing optimistic execution, and then sorting out conflicts. After all, in many (most?) cases conflicts are rather not the common execution path ...

"""
One of the workloads that we've tried to parallelize with transactions is the Python interpreter. While the Python programming language provides threads as part of the language, its canonical implementation (CPython) provides only limited concurrency as bytecode execution is performed while holding a ``global interpreter lock.'' With the expectation that the bytecodes would likely be embarrassingly parallel, one of my students undertook replacing this overly conservative concurrency control mechanism with the optimistic execution and fine-grain conflict detection provided by wrapping each bytecode with a (hardware) transaction. Our lessons learned were two-fold: 1) it was remarkably easy to eliminate the false conflicts resulting from the interpreter's use of global variables, and 2) it was remarkably hard to correctly deal with the bytecodes (and therefore the transactions) that resulted in system calls and I/O, in part because they may be encapsulated in native code. In hindsight, we recognized that exposing Python's concurrency would have been much easier by programming to a hardware abstraction that speculatively executed lock-based critical sections (i.e., Speculative Lock Elision (SLE)), which would have transparently handled the system call and I/O issues correctly. We believe that this conclusion likely generalizes to many of the workloads that have been executed on TM prototypes.
"""

source: http://www-sal.cs.uiuc.edu/~zilles/tm.html

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 1:23 PM
Reply to this message Reply
> But I've got a feeling that Bruce isn't thinking of this
> scenario when he asks for actors (which I remember him
> bringing up in 2001-2003, so at least he's consistent :-).
> Unfortunately I can't quite think what problem area he
> e wants to address. There are many different ways one can
> use multiple CPUs to make a given algorithm faster, but it
> depends a lot on the algorithm how you have to code it to
> benefit.

Again, started writing a reply to this and ended up with another weblog entry:
http://www.artima.com/weblogs/viewpost.jsp?thread=214627

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 1:42 PM
Reply to this message Reply
> The threading API is
> nice and simple, and is instantly understood by those
> coming from a C/C++ and Java world.

This is one of the characteristic comments made by people who trivialize the complexity of threading. And it's an excellent example: the Python API is indeed one of the simpler threading APIs, but to say it is "instantly understood" by people coming from C/C++/Java is completely wrong. They *think* they understand it, and only later discover some very important things about it. Like the GIL, and like the fact that it won't actually use more than one CPU. If someone uses "instantly understand" and "threading" in the same sentence, my brain rewrites that sentence to say "doesn't really understand threading."

> I have worked on multi-threaded systems of
> varying levels of complexity and it is very much possible
> to get them to work correctly. I never understood why some
> people have such an aversion against threads. In many
> cases, threads are a very natural way of expressing
> program logic. And unless your data structures are madly
> complex you should be able to get the locking right as
> well. It's definitely possible, and I simply don't buy the
> 'threads are wrong/evil/bad argument at all'.

If you really understood how complex it is to write a correct threaded program, you would be concerned, not about yourself (because you are clearly one of the brilliant few to whom threading is transparently obvious), but about all the other programmers who aren't as smart as you are.

Such brilliant people exist; they look at something and the answer seems obvious and they don't understand why it isn't obvious to everyone else. But (1) those aren't most programmers and (2) I generally find that this kind of overconfidence eventually produces catastrophic results.

And I keep running into people who appear to be very smart but don't seem to agree with you. Like Brian Goetz, who varies between saying that threads are "extremely difficult" to "impossible" to get right.

Not that this will have any effect, since you've already decided that threads are trivial to understand and to program correctly. But try to understand in this moment that the vast majority of programmers may not have your insights. And also that you may have one or two "aha" moments in the future where you realize that you didn't really understand threads before. The second or third time this happens you'll begin to doubt your ability to ever understand them completely. Or at least, that's what happened to me. However, I have studied concurrency for years, and I probably just learn a lot slower than you do.

Juergen Brendel

Posts: 8
Nickname: jbrendel
Registered: Sep, 2007

Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 5:31 PM
Reply to this message Reply
Bruce, I'm a bit disappointed about you sinking to such a low with your response. A personal attack of that form - nicely veiled as it was - clearly was not necessary. Especially, if you could have just assumed a little less and actually bothered to read what I wrote.

Oh well, I'll try to answer anyway.

> This is one of the characteristic comments made by people
> who trivialize the complexity of threading. And it's an
> excellent example: the Python API is indeed one of the
> simpler threading APIs, but to say it is "instantly
> understood" by people coming from C/C++/Java is completely
> wrong. They *think* they understand it, and only later
> discover some very important things about it. Like the
> GIL, and like the fact that it won't actually use more
> than one CPU.

Yeah, and why do I have to reiterate to you of all people that the GIL is something cPython specific, while the threading API in itself is completely independent of that? I thought we went over this many times? So, when I talk about using the API, I obviously talk about - you know - writing a program, creating threads, communicating between threads, and so forth. Nothing about the GIL.

You are absolutely right, you only learn later that the GIL actually prevents the threading in cPython to be useful in some cases, but that certainly didn't prevent me from understanding how to use the API. Maybe not even all the functions it offers, but certainly enough to get started. After all, the same threading API in Jython works just fine.

So, I have no idea what you are on about when you throw the GIL and the threading API into the same sentence.


> If someone uses "instantly understand" and
> "threading" in the same sentence, my brain rewrites that
> sentence to say "doesn't really understand threading."

Oh please! That is your problem. You assume too much, obviously. I didn't think of you that way, which is why I'm so disappointed in your post.


> > I have worked on multi-threaded systems of
> > varying levels of complexity and it is very much
> possible
> > to get them to work correctly. I never understood why
> some
> > people have such an aversion against threads. In many
> > cases, threads are a very natural way of expressing
> > program logic. And unless your data structures are
> madly
> > complex you should be able to get the locking right as
> > well. It's definitely possible, and I simply don't buy
> the
> > 'threads are wrong/evil/bad argument at all'.
>
> If you really understood how complex it is to write a
> correct threaded program, you would be concerned, not
> about yourself (because you are clearly one of the
> brilliant few to whom threading is transparently obvious),

Yes, yes, here we go. The usual lame old approach to beat another person's argument to death. Nice one, Bruce. And after that whatever I say is going to be disqualified from the start, right? Really lame.

If you would have read what I wrote, you would have seen this: "In many cases, threads are a very natural way of expressing program logic." I wrote: In many (!) cases. Not all!

I also said that the difficulty of 'getting threading right' will depend on the complexity of the data structures you are dealing with. You didn't see this either? Apparently not. You just felt like going on a rant like this.


> but about all the other programmers who aren't as smart as
> you are.

Same old, same old.


> Such brilliant people exist; they look at something and
> the answer seems obvious and they don't understand why it
> isn't obvious to everyone else. But (1) those aren't most
> programmers and (2) I generally find that this kind of
> overconfidence eventually produces catastrophic results.

Yes, well, good for them if they are so smart. It's not always obvious for me at all. And sure, baseless overconfidence can produce catastrophic results.


> And I keep running into people who appear to be very smart
> but don't seem to agree with you. Like Brian Goetz, who
> varies between saying that threads are "extremely
> difficult" to "impossible" to get right.

As a general case, I can believe that. But making threads out to be evil because of that is not right either. As I said (do I have to repeat it?): There are some problems for which they work very well. I won't use them for everything, but I have come across many issues where they just worked and worked well.

Hey, maybe I have a completely bizarre and odd career path which led me to have those experiences, but frankly, I don't think so. Maybe the smart people I was lucky enough to work with where all just totally overconfident or those few exceptions you talk about, but somehow, I don't think so.


> Not that this will have any effect, since you've already
> decided that threads are trivial to understand and to
> program correctly.

[ the usual lame rhetoric designed to discredit whatever else I could say ahead of time... ]

> But try to understand in this moment
> that the vast majority of programmers may not have your
> insights. And also that you may have one or two "aha"
> moments in the future where you realize that you didn't
> really understand threads before.

Oh come on, I didn't just come crawling out from under a rock, you know? Don't you think I have had moments where it was tough to get the threading right? Of course I did! That's why I said (over and over again): Threads are a natural fit for many problems, but 'many' does not mean 'all' as you may faintly recall. If you stick to the problems for which they are a fit then you won't have all those 'terrible, evil' problems you are going on about here.

Juergen

Flat View: This topic has 54 replies on 4 pages [ « | 1  2  3  4 | » ]
Topic: Should Microsoft Buy Yahoo? Previous Topic   Next Topic Topic: Development Management: Carthorse, Racehorse, or Wild Horse?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use