The Artima Developer Community
Sponsored Link

Weblogs Forum
It isn't Easy to Remove the GIL

54 replies. Most recent reply: Sep 12, 2008 5:12 PM by Patrick Stinson

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a flat view of this topic  Flat View
Previous Topic   Next Topic
Threaded View: This topic has 54 replies on 1 page
Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

It isn't Easy to Remove the GIL (View in Weblogs) Posted: Sep 10, 2007 10:50 AM
Reply to this message Reply
Summary
A response to a blog post by Juergen Brendel pleading for the removal of the GIL.

yesterday, Juergen Brendel blogs at length about the disadvantages of the GIL. He claims it is an architectural decision I'm making that limits his productivity.

I don't expect that this or any response will stop the requests for the GIL's removal, despite a well-reasoned FAQ entry about the issue. But I also don't expect it to go away until someone other than me goes through the effort of removing it, and showing that its removal doesn't slow down single-threaded Python code.

This has been tried before, with disappointing results, which is why I'm reluctant to put much effort into it myself. In 1999 Greg Stein (with Mark Hammond?) produced a fork of Python (1.5 I believe) that removed the GIL, replacing it with fine-grained locks on all mutable data structures. He also submitted patches that removed many of the reliances on global mutable data structures, which I accepted. However, after benchmarking, it was shown that even on the platform with the fastest locking primitive (Windows at the time) it slowed down single-threaded execution nearly two-fold, meaning that on two CPUs, you could get just a little more work done without the GIL than on a single CPU with the GIL. This wasn't enough, and Greg's patch disappeared into oblivion. (See Greg's writeup on the performance.)

I'd welcome it if someone did another experiment along the lines of Greg's patch (which I haven't found online), and I'd welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bound program) does not decrease.

I would also be happy if someone volunteered to maintain a GIL-free fork of Python, in case that the single-threaded performance goal can't be met but there is significant value for multi-threaded CPU-bound applications. We might even end up with all the changes permanently part of the code base, but enabled only on request at compile time.

However, I want to warn that there are many downsides to removing the GIL. It complicates life for extension modules, who can no longer expect that they are invoked in a "safe zone" protected by the GIL -- as soon as an extension has any global mutable data, will have to be prepared with concurrent calls from multiple threads. There might also be changes in the Python/C API necessitated by the need to lock certain objects for the duration of a sequence of calls.

While it is my personal opinion, based upon the above considerations, that there isn't enough value in removing the GIL to warrant the effort, I will welcome and support attempts to show that times have changed. However, there is no point in pleading alone -- Python is open source and I have my hands full dealing with the efforts to produce a quality 3.0 language definition and implementation on time. I want to point out one more time that the language doesn't require the GIL -- it's only the CPython virtual machine that has historically been unable to shed it.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 11:42 AM
Reply to this message Reply
Posted by: Jesse Noller    Posts: 3 / Nickname: jnoller / Registered: Sep, 2007
Thanks for posting this - pointing out that again, this is not a limitation of the language itself, but rather a limitation the cPython interpreter "reference" implementation (yet again) maybe help some people realize that this is more appropriate for implementation in a project outside of the "core" cPython team.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 12:01 PM
Reply to this message Reply
Posted by: Ron Stephens    Posts: 2 / Nickname: awaretek / Registered: Jan, 2003
Guido, this is a well reasoned response. Please do not become distracted by calls for feature Y or change Z or by criticism X. Too many times, some of us in the community get carried away with marketing hype for feature XYZ that Python lacks, and that language or implementation DEF has, and we fail to notice that it is often the good decisions on what to focus on (and thus what not to focus on) that make Python better over all than the alternatives.

Do we notice that the cPython implementation runs relatively fast and efficiently, that it scales, that it is rock solid? Do we take comfort in the fact that the language is so well defined that it has numerous excellent alternative implementations, each of which offers selective advantages for some uses (Jython, Stackless, PyPy, IronPython, and others)?

Maybe we do appreciate all of the above, but we also at times want Python core development to have unlimited resources and we lust after a language that is clearly superior in every way for every application.

Please forgive us for our hubris, but don't let us distract you. Thanks for listening, though! ;-))

Ron Stephens


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 12:10 PM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> Guido, this is a well reasoned response. Please do not
> become distracted by calls for feature Y or change Z or by
> criticism X. [...]

Thanks for the support. I have said numerous times that I don't want Python development to be driven by e.g. Ruby-envy. (Ruby BTW has a GIL too.)

But it didn't seem appropriate to ignore Juergen's "open letter to Guido van Rossum". I hope that Juergen's response echoes yours, but I'm skeptical until I see it.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 7:55 PM
Reply to this message Reply
Posted by: Juergen Brendel    Posts: 8 / Nickname: jbrendel / Registered: Sep, 2007
Hello Guido,

> Thanks for the support. I have said numerous times that I
> don't want Python development to be driven by e.g.
> Ruby-envy. (Ruby BTW has a GIL too.)

I don't 'speak' Ruby, so no envy there. :-) The envy is more from the fact that in so many other languages, even those that I personally don't like as much as Python, I can take advantage of multiple threads 'natively'. Using the thread API there is all I need. It allows me to utilize modern hardware easily.

Yes, I understand it's cPython, not the language itself. That's what makes it even more startling: The language and libraries offer a beautifully simple threading API, but modern hardware cannot be taken advantage of with it.


> But it didn't seem appropriate to ignore Juergen's "open
> letter to Guido van Rossum". I hope that Juergen's
> response echoes yours, but I'm skeptical until I see it.

Echoing the "don't become distracted" call? Yes, I guess I can't really echo that, since I was writing about it in the first place.

I think the discussion here is beginning to point out possibilities, which is wonderful. Keeping the GIL, but allowing multiple threads to run simultaneously. That would work wonders already. As you said in another response: "...a project to add GIL-free threading to Python might work...".

You know, as I said in my article: I really like Python and it's my language of choice for most anything these days. I posted a couple of articles to that effect before. I don't need to reiterate the points I made about threading here, but I guess you could put it this way: If you really care about something (in this case Python) then having strong opinions one way or the other should be understandable. If I wouldn't care about Python and want it to become applicable for a wide range of applications, I wouldn't have bothered writing that article.


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 7:14 AM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> The envy is
> more from the fact that in so many other languages, even
> those that I personally don't like as much as Python, I
> can take advantage of multiple threads 'natively'.

What other mainstream languages besides Java and C++? Perl's threads are less functional than Python's, Tcl is mostly based around a single-threaded event model, Ruby has a GIL like Python.


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 11:27 AM
Reply to this message Reply
Posted by: Juergen Brendel    Posts: 8 / Nickname: jbrendel / Registered: Sep, 2007
> > The envy is more from the fact that in so many other languages, even
> > those that I personally don't like as much as Python, I
> > can take advantage of multiple threads 'natively'.
>
> What other mainstream languages besides Java and C++?
> Perl's threads are less functional than Python's, Tcl is
> s mostly based around a single-threaded event model, Ruby
> has a GIL like Python.

Ah, yes, you got me there. I should have phrased this more carefully. Java and C/C++ are the languages I am most familiar with, outside of Python. So these are the languages I am mostly referring to.

However, I would say that most people who are or have been in system development are familiar with one of those languages, aren't they? And many then will have come across threading in those contexts.

That is why - in my opinion - Python could benefit so much from supporting the familiar threading API and multi-core hardware.

Consider also that in Python things mostly work as expected, which is one of the nicest features of the language and leads to comparisons with 'executable pseudo code' and all that. You just write it and it works as intended. The fact that threading does not quite work as one would expect breaks this philosophy.

There are people here on the mailing list who say: If you need performance, use another language, or write a C extension... Well, sadly having to resort to a 'hack' (which it is compared to the elegance of pure Python) does not speak well of Python. It's nice that it is possible to easily do that but it simply shouldn't be necessary.

In the end, though, after following the discussion here for a bit, I have to agree: This is really not about the GIL at all. It is about being able to take advantage of multiple CPUs easily and effectively. If at the end of the day multi-processing is hidden behind the scenes then this should be perfectly fine, as long as the API works the same on all the platforms, is as straight forward as the threading API, and if data can actually be shared without having to resort to message passing behind the scenes. The 'processing' module seems to be a good start in that direction from what I can see, but I have not tried it yet.

There has been a suggestion here in the discussion about an agent-based approach, so basically a different API rather than the threading API. I'm not so sure about that. Not so much for technical reasons, but for the above mentioned reasons of familiarity. The threading API is nice and simple, and is instantly understood by those coming from a C/C++ and Java world. I would think that for those reasons alone it would be a benefit to use that API.

And one last comment about the descriptions of threads as 'evil' or difficult or impossible to get right, which I have read here and in some other discussions. I think the 'evilness' of threads depends very much on the system you are working on. I have worked on multi-threaded systems of varying levels of complexity and it is very much possible to get them to work correctly. I never understood why some people have such an aversion against threads. In many cases, threads are a very natural way of expressing program logic. And unless your data structures are madly complex you should be able to get the locking right as well. It's definitely possible, and I simply don't buy the 'threads are wrong/evil/bad argument at all'.


Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2007 7:25 PM
Reply to this message Reply
Posted by: John M Camara    Posts: 15 / Nickname: camara / Registered: Dec, 2004
> I never understood why some
> people have such an aversion against threads.

Well, for some developers, threads are the only mechanism they use to solve concurrency issues so unfortunately they are often used and abused in the wrong places. I'm sure threading wouldn't get the bad rap they do by some if most developers would be willing to accept that threading is no golden solution to all their problems and learn a few additional approaches to solving concurrency issues.

I don't have any expectations that people will stop abusing the use of threads for all their problems. It's just as unlikely as trying to get most developers to realize that relational databases are not always the best storage solution. People can be real stubborn about learning new tricks.


Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 1:42 PM
Reply to this message Reply
Posted by: Bruce Eckel    Posts: 875 / Nickname: beckel / Registered: Jun, 2003
> The threading API is
> nice and simple, and is instantly understood by those
> coming from a C/C++ and Java world.

This is one of the characteristic comments made by people who trivialize the complexity of threading. And it's an excellent example: the Python API is indeed one of the simpler threading APIs, but to say it is "instantly understood" by people coming from C/C++/Java is completely wrong. They *think* they understand it, and only later discover some very important things about it. Like the GIL, and like the fact that it won't actually use more than one CPU. If someone uses "instantly understand" and "threading" in the same sentence, my brain rewrites that sentence to say "doesn't really understand threading."

> I have worked on multi-threaded systems of
> varying levels of complexity and it is very much possible
> to get them to work correctly. I never understood why some
> people have such an aversion against threads. In many
> cases, threads are a very natural way of expressing
> program logic. And unless your data structures are madly
> complex you should be able to get the locking right as
> well. It's definitely possible, and I simply don't buy the
> 'threads are wrong/evil/bad argument at all'.

If you really understood how complex it is to write a correct threaded program, you would be concerned, not about yourself (because you are clearly one of the brilliant few to whom threading is transparently obvious), but about all the other programmers who aren't as smart as you are.

Such brilliant people exist; they look at something and the answer seems obvious and they don't understand why it isn't obvious to everyone else. But (1) those aren't most programmers and (2) I generally find that this kind of overconfidence eventually produces catastrophic results.

And I keep running into people who appear to be very smart but don't seem to agree with you. Like Brian Goetz, who varies between saying that threads are "extremely difficult" to "impossible" to get right.

Not that this will have any effect, since you've already decided that threads are trivial to understand and to program correctly. But try to understand in this moment that the vast majority of programmers may not have your insights. And also that you may have one or two "aha" moments in the future where you realize that you didn't really understand threads before. The second or third time this happens you'll begin to doubt your ability to ever understand them completely. Or at least, that's what happened to me. However, I have studied concurrency for years, and I probably just learn a lot slower than you do.


Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 5:31 PM
Reply to this message Reply
Posted by: Juergen Brendel    Posts: 8 / Nickname: jbrendel / Registered: Sep, 2007
Bruce, I'm a bit disappointed about you sinking to such a low with your response. A personal attack of that form - nicely veiled as it was - clearly was not necessary. Especially, if you could have just assumed a little less and actually bothered to read what I wrote.

Oh well, I'll try to answer anyway.

> This is one of the characteristic comments made by people
> who trivialize the complexity of threading. And it's an
> excellent example: the Python API is indeed one of the
> simpler threading APIs, but to say it is "instantly
> understood" by people coming from C/C++/Java is completely
> wrong. They *think* they understand it, and only later
> discover some very important things about it. Like the
> GIL, and like the fact that it won't actually use more
> than one CPU.

Yeah, and why do I have to reiterate to you of all people that the GIL is something cPython specific, while the threading API in itself is completely independent of that? I thought we went over this many times? So, when I talk about using the API, I obviously talk about - you know - writing a program, creating threads, communicating between threads, and so forth. Nothing about the GIL.

You are absolutely right, you only learn later that the GIL actually prevents the threading in cPython to be useful in some cases, but that certainly didn't prevent me from understanding how to use the API. Maybe not even all the functions it offers, but certainly enough to get started. After all, the same threading API in Jython works just fine.

So, I have no idea what you are on about when you throw the GIL and the threading API into the same sentence.


> If someone uses "instantly understand" and
> "threading" in the same sentence, my brain rewrites that
> sentence to say "doesn't really understand threading."

Oh please! That is your problem. You assume too much, obviously. I didn't think of you that way, which is why I'm so disappointed in your post.


> > I have worked on multi-threaded systems of
> > varying levels of complexity and it is very much
> possible
> > to get them to work correctly. I never understood why
> some
> > people have such an aversion against threads. In many
> > cases, threads are a very natural way of expressing
> > program logic. And unless your data structures are
> madly
> > complex you should be able to get the locking right as
> > well. It's definitely possible, and I simply don't buy
> the
> > 'threads are wrong/evil/bad argument at all'.
>
> If you really understood how complex it is to write a
> correct threaded program, you would be concerned, not
> about yourself (because you are clearly one of the
> brilliant few to whom threading is transparently obvious),

Yes, yes, here we go. The usual lame old approach to beat another person's argument to death. Nice one, Bruce. And after that whatever I say is going to be disqualified from the start, right? Really lame.

If you would have read what I wrote, you would have seen this: "In many cases, threads are a very natural way of expressing program logic." I wrote: In many (!) cases. Not all!

I also said that the difficulty of 'getting threading right' will depend on the complexity of the data structures you are dealing with. You didn't see this either? Apparently not. You just felt like going on a rant like this.


> but about all the other programmers who aren't as smart as
> you are.

Same old, same old.


> Such brilliant people exist; they look at something and
> the answer seems obvious and they don't understand why it
> isn't obvious to everyone else. But (1) those aren't most
> programmers and (2) I generally find that this kind of
> overconfidence eventually produces catastrophic results.

Yes, well, good for them if they are so smart. It's not always obvious for me at all. And sure, baseless overconfidence can produce catastrophic results.


> And I keep running into people who appear to be very smart
> but don't seem to agree with you. Like Brian Goetz, who
> varies between saying that threads are "extremely
> difficult" to "impossible" to get right.

As a general case, I can believe that. But making threads out to be evil because of that is not right either. As I said (do I have to repeat it?): There are some problems for which they work very well. I won't use them for everything, but I have come across many issues where they just worked and worked well.

Hey, maybe I have a completely bizarre and odd career path which led me to have those experiences, but frankly, I don't think so. Maybe the smart people I was lucky enough to work with where all just totally overconfident or those few exceptions you talk about, but somehow, I don't think so.


> Not that this will have any effect, since you've already
> decided that threads are trivial to understand and to
> program correctly.

[ the usual lame rhetoric designed to discredit whatever else I could say ahead of time... ]

> But try to understand in this moment
> that the vast majority of programmers may not have your
> insights. And also that you may have one or two "aha"
> moments in the future where you realize that you didn't
> really understand threads before.

Oh come on, I didn't just come crawling out from under a rock, you know? Don't you think I have had moments where it was tough to get the threading right? Of course I did! That's why I said (over and over again): Threads are a natural fit for many problems, but 'many' does not mean 'all' as you may faintly recall. If you stick to the problems for which they are a fit then you won't have all those 'terrible, evil' problems you are going on about here.

Juergen


Re: It isn't Easy to Remove the GIL Posted: Sep 17, 2007 9:42 AM
Reply to this message Reply
Posted by: Bruce Eckel    Posts: 875 / Nickname: beckel / Registered: Jun, 2003
> Bruce, I'm a bit disappointed about you sinking to such a
> low with your response. A personal attack of that form -
> nicely veiled as it was - clearly was not necessary.
> Especially, if you could have just assumed a little less
> and actually bothered to read what I wrote.

You're right. I apologize. Your post pushed my sarcasm button and I should have reworked it before posting it.

The thing that pushed the button is hearing "but threading just isn't that hard" one too many times. It took me a lot of study and experimentation -- years -- to begin realizing that threading really is that hard. And there were a number of periods where I thought I actually understood it, so I need to be more understanding when someone else is in one of those phases as well.

I have written giant chapters about it both in C++ and in Java and that may have convinced some people of the complexity. I probably need to write a much shorter chapter or article that can somehow make the case that threads in general are not the right solution. But it's not easy and people who think threads are easy are probably not going to be convinced. Threads are fascinating and they draw you in with the feeling that if you can just put one more locking mechanism into the system then threads will work OK. Not unlike "just one more static typing mechanism" in the static-vs-dynamic discussions.

We do have a fundamental disagreement, though. I have come to the conclusion that shared-memory concurrency is impossible for most programmers to get right, and you feel that threads are a reasonable solution for a significant class of problems, and that you are able to get it right when you need to.

I shouldn't have replied sarcastically, but your reply seemed to suggest that it was obvious that most programmers could write correct threaded programs. I guess that frustrates me because I don't know how to explain the difficulties of threading well enough.


Re: It isn't Easy to Remove the GIL Posted: Sep 17, 2007 11:10 AM
Reply to this message Reply
Posted by: Juergen Brendel    Posts: 8 / Nickname: jbrendel / Registered: Sep, 2007
> You're right. I apologize. Your post pushed my sarcasm
> button and I should have reworked it before posting it.

It's ok, don't worry. I'm gladly accepting your apology.

When someone greatly cares about a topic then it is often difficult to keep the discussion free of any emotions. You obviously care and know very much about the topic of threading, which makes this understandable.

I like to avoid conflicts when I can, and I'm happy about your graceful response, which helps us all to get back to the basics of this discussion.


> The thing that pushed the button is hearing "but threading
> just isn't that hard" one too many times. It took me a lot
> of study and experimentation -- years -- to begin
> realizing that threading really is that hard. And there
> were a number of periods where I thought I actually
> understood it, so I need to be more understanding when
> someone else is in one of those phases as well.

Ok. It's interesting to be described as being 'in a phase', but Ok. :-)


> I have written giant chapters about it both in C++ and in
> Java and that may have convinced some people of the
> complexity. I probably need to write a much shorter
> chapter or article that can somehow make the case that
> threads in general are not the right solution. But it's
> not easy and people who think threads are easy are
> probably not going to be convinced. Threads are
> fascinating and they draw you in with the feeling that if
> you can just put one more locking mechanism into the
> system then threads will work OK. Not unlike "just one
> more static typing mechanism" in the static-vs-dynamic
> discussions.

While I haven't written books about this, and also wouldn't say that I have studied threading for years, I would say that I have written multi-threading programs on a number of different platforms for many years now.

I don't believe for a moment that I am some sort of special threading superstar. I'm not. I have had my fair share of deadlocks, head-scratching, hard-resets on machines with locked kernel threads and so on. I can tell you that. I have written multi-threaded programs that worked 'really well' ... until I actually tried to run them on an SMP machine.

The scariest thing is when you get a memory corruption in your shared data structure and there doesn't seem to be any decent debugging tool available that tells you what's going on. Valgrind informs you where the corruption occurs, but you just can't see 'why' it happens, because it should be 'impossible'. Then you sit with your team at the table, staring at the print-out of the code and just hope that someone suddenly says 'Aha!' and points at the one line of code where your intricate locking falls apart. And if you don't get that 'Aha!' moment you will be royally stuck...

I've been through all of this plenty of times. I guess that is why I don't see myself as being particularly naive about the topic.


> We do have a fundamental disagreement, though. I have come
> to the conclusion that shared-memory concurrency is
> impossible for most programmers to get right, and you feel
> that threads are a reasonable solution for a significant
> class of problems, and that you are able to get it right
> when you need to.

Yeah, well, sometimes it was a pretty significant struggle, as you can see from what I wrote above. There were moments when we weren't so sure whether we would ever 'get it right'. :-)

I think that a shared-nothing approach, in which all communication between 'threads' (or processes) takes place via message queues definitely makes it easier to arrive at a correct program. You can program with this model quite easily, even using the normal threading API. I have had several problems where this was exactly the right approach to take. If this model is appropriate for a problem then one should stick to it.

Maybe out of sheer (mis)fortune, though, I had to work on several projects where this approach didn't work in all cases. Sometimes, a legacy system with a large, 'central' data structure needed to be made to take advantage of multiple CPU cores, and so the work had to be broken out across threads without having to re-write or re-architect the core of the system.

In other cases, the nature of the shared data didn't lend itself to being communicated via messages. The overhead of doing so would have been prohibitive. For example, some large tree. You have locks on certain branches, or even the individual nodes at times, which allows multiple threads to work at the same time in the tree. This is very difficult to get right, but the performance requirements (and other considerations) pretty much mandated that this was the approach that had to be taken.

Once something like this has been wrestled to the ground and has been made to work, the advantage is that all data can be modified in place. No additional copying (for messages) is necessary, no marshalling and unmarshalling when sending data, and so on. Sure, you need to consider the benefit of in-place modification if your individual cores have individual memory caches, but it still can often work out to your advantage.

So, I guess my point is this: Yes, avoid the shared memory if you can, but realize that sometimes it still is the right approach. The current threading API (all GIL issues aside) allows me to write shared-nothing, message-based systems. But it also allows me to use locks and shared data to my hearts content. That's why I like the current threading API. It gives me a choice. And sometimes, I need to make the (hopefully) informed choice to bite the bullet and go for the shared memory approach.

I'd be happy if all documentation chapters that mention multi-threading and shared memory come with a warning label, similar to cigarette packages...


> I shouldn't have replied sarcastically, but your reply
> seemed to suggest that it was obvious that most
> programmers could write correct threaded programs.

I hope that my reply clarified my view on this: Writing correct threaded programs with shared memory is not obvious, and it's not always easy. Not for many good programmers and definitely not for me either.

> I guess that frustrates me because I don't know how to explain the difficulties of threading well enough.

May I humbly suggest that in your next article or book about this you relay the experience I described above: Looking with your team at the print-outs, not knowing if you will ever find the problem with your program? In moments like this you can easily get that terrible, sinking feeling, especially if you are supposed to check in the 'fix' for the deadlock problem tonight for the upcoming release. It's already 6.30 pm and you have no idea when or even if you will ever fix it.

Of course, there are also plenty of other areas in software development where you can get just as stuck. But the potential for this certainly is there when you are dealing with shared memory multi-threading.

"Warning: Shared data multi-threading ahead. Proceed with caution!"

Sometimes I still have to proceed.


Re: It isn't Easy to Remove the GIL Posted: Sep 19, 2007 12:00 AM
Reply to this message Reply
Posted by: Adam Olsen    Posts: 11 / Nickname: rhamph / Registered: Jun, 2007
> So, I guess my point is this: Yes, avoid the shared memory
> if you can, but realize that sometimes it still is the
> right approach. The current threading API (all GIL issues
> aside) allows me to write shared-nothing, message-based
> systems. But it also allows me to use locks and shared
> data to my hearts content. That's why I like the current
> threading API. It gives me a choice. And sometimes, I need
> to make the (hopefully) informed choice to bite the bullet
> and go for the shared memory approach.
>
> I'd be happy if all documentation chapters that mention
> multi-threading and shared memory come with a warning
> label, similar to cigarette packages...

I'd be happy if cigarettes were made an illegal/prescription-only substance, after an appropriate depreciation period and with free addiction clinics. ;)

If it's not possible to provide a reasonably safe way to use threads then python shouldn't provide them, and we should use processes or event-driven instead. (Threads would still be available in C of course.)

I think we can make it work though. The trick is to not provide shared state as such. Language-enforced monitors or actors can partition away the data in a way that doesn't allow low-level corruption and encourages better structuring.

Although there's some obvious cases where monitors or actors aren't sufficient, you can always write a specialized container in C that provides a safe API. For example, if you have a large list that you want to subdivide and process amongst many threads, you could have "views" that give exclusive access to a section of it at a time.

I see this as the same as pointer arithmetic. There's some tricks you have to give up in exchange for a much safer and more reliable programming environment. You can always write in C if you need them, but only a small minority of code does.


The Problem with Threads Posted: Oct 2, 2007 12:56 AM
Reply to this message Reply
Posted by: Andrew Inggs    Posts: 2 / Nickname: aminggs / Registered: Apr, 2006
I too have gone through phases where I thought I fully understood threading, only to find yet deeper flaws in my understanding. Here's an article from IEEE Computer that helped me come to the conclusion that threading is the wrong paradigm for writing concurrent programs:

The Problem with Threads
by Edward A. Lee
University of California, Berkeley
http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover.xml&xsl=article.xsl


Re: The Problem with Threads Posted: Oct 2, 2007 11:53 AM
Reply to this message Reply
Posted by: Adam Olsen    Posts: 11 / Nickname: rhamph / Registered: Jun, 2007
> I too have gone through phases where I thought I fully
> understood threading, only to find yet deeper flaws in my
> understanding. Here's an article from IEEE Computer that
> helped me come to the conclusion that threading is the
> wrong paradigm for writing concurrent programs:

But how can threads be so wrong when processes are so right? Having a shared address space (efficiently shared data) isn't the problem. Using it for mutable state, requiring a memory model to determine correctness, is.

Halfway down they discuss the difficulty in writing a listener pattern, but this has nothing to do with threads! I could have the same problems (wanting to send asynchronous, ordered notifications without risk of unbounded memory usage) with actors, processes, CSP, or UML. Even gtk's signals, which aren't threaded, must confront this problem. Solving it is more about making the cycles obvious (so they can be handled) than making them impossible.

I think much of the problem in the concurrency world is the focus on creating pure, elegant solutions that inherently avoid all problems (ie a silver bullet), rather than building a set of convenience functions that avoid them well in practise.

My approach to python "threading" (which I could just as well call processes, actors, etc) is to build a solid foundation. The convenience tools to solve a broad set of use cases can be added later, as a library.


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 2:47 PM
Reply to this message Reply
Posted by: Tom Poindexter    Posts: 1 / Nickname: tpoind / Registered: Sep, 2007
> What other mainstream languages besides Java and C++?
> Perl's threads are less functional than Python's, Tcl is
> s mostly based around a single-threaded event model, Ruby
> has a GIL like Python.

Tcl's event model goes a long way to provide the appearance of multi-threading without the hassle of threading issues, like the GIL. Even so, if your application absolutely needs threads, Tcl has thread support, and a fairly elegant solution at that.

Tcl's threading model is 'one interp per thread', and thus avoids global locks. Inter-thread communications is done by sending events (using each interperter's event queue) of some bit of Tcl code to be executed in the target thread. The Tcl interpreter has to be built with threading enabled. The performance penalty of running a single-threaded application in a threaded interpreter is about 5% (IIRC).

For reference, see: http://www.tcl.tk/doc/howto/thread_model.html
and http://wiki.tcl.tk/1339


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 11:57 AM
Reply to this message Reply
Posted by: Jason Petrone    Posts: 1 / Nickname: jpetrone / Registered: Sep, 2007
Greg's free threading patch can be found at:

http://www.python.org/ftp/python/contrib-09-Dec-1999/System/threading.tar.gz


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 12:19 PM
Reply to this message Reply
Posted by: Bruce Eckel    Posts: 875 / Nickname: beckel / Registered: Jun, 2003
I think this is a situation where there's a leap to implementation details rather than getting clear about the problem.

The problem is that cpython can't use more than one processor at a time, and is thus passing up what might be the biggest opportunity to eliminate the speed argument against Python.

I actually don't think removing the GIL is a good solution. But I don't think threads are a good solution, either. They're too hard to get right, and I say that after spending literally years studying threading in both C++ and Java. Brian Goetz has taken to saying that no one can get threading right.

We do need some kind of solution, but it probably shouldn't be threads. I think a process-based approach is probably best. I'd like to see if it's possible to, from within one cpython instance, easily start up a second one in a different process and easily communicate between them. Then you could use an agent system and the programming would become very easy and safe, while effortlessly making use of multiple processors. And no GIL removal would be necessary.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 12:28 PM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> We do need some kind of solution, but it probably
> shouldn't be threads. I think a process-based approach is
> probably best. I'd like to see if it's possible to, from
> within one cpython instance, easily start up a second one
> in a different process and easily communicate between
> them. Then you could use an agent system and the
> programming would become very easy and safe, while
> effortlessly making use of multiple processors. And no GIL
> removal would be necessary.

This begs the same question though: if you want this, come up with a proposed design and implementation. I am not an expert in this area so I need some help beyond "look at this other language's solution". (And arguably you should have posted this in response to Juergen's blog, since that's where the request to eliminate the GIL originated.)

FWIW, I don't think that solving the GIL issue will remove the speed argument against Python -- If a Python program is X times slower than a Java program, using N CPUs doesn't change the factor X -- both the Python version (with the GIL removed) and the Java version will run approximately N times faster, so the speed advantage of Java is still X.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 1:02 PM
Reply to this message Reply
Posted by: Fazal Majid    Posts: 1 / Nickname: fazalmajid / Registered: Sep, 2007
I dislike the GIL, but can understand the reasons for its presence, and why it is hard to get rid of. Ruby would not be an example to hold up, in any case. When I first heard about it, I discarded it almost immediately as it actually used cooperative multitasking (this may have changed since). Erlang would be a better source of inspiration.

As Bruce says, we need to find a solution to the concurrency problem, now that Moore's law is running out of steam for single-thread execution and the trend is moving towards multicore CPUs.

Python needs better batteries-included IPC, i.e. easy to use and included out of the box. Something like the Queue module, except that it allows multiple processes to communicate. There are a number of middleware options available (I use omniORB) but none of them is seamless or included in the standard distribution.

We also need a better inter-process object sharing mechanism. POSH would be great but seems orphaned and last time I tried to use it would just segfault on me. PyLinda is simple, but requires a server process rather than shared memory.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 1:55 PM
Reply to this message Reply
Posted by: Bruce Eckel    Posts: 875 / Nickname: beckel / Registered: Jun, 2003
> If you want this, come
> up with a proposed design and implementation. I am not an
> expert in this area so I need some help beyond "look at
> this other language's solution".

I have a similar problem, in that I have a fair bit of understanding of concurrency but not much understanding of Python internals. I wonder if there would be enough interest to try to organize a small conference around this topic to bring together the different fields of expertise that might solve the problem. I've gotten good at organizing small conferences, but not publicizing them, so I'd need help with that.

At the very least, a sprint at the next Pycon, which I'd be willing to organize if there's enough interest.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 2:03 PM
Reply to this message Reply
Posted by: anthony boudouvas    Posts: 14 / Nickname: objectref / Registered: Jun, 2007
> > We do need some kind of solution, but it probably
> > shouldn't be threads. I think a process-based approach
> is
> > probably best. I'd like to see if it's possible to,
> from
> > within one cpython instance, easily start up a second
> one
> > in a different process and easily communicate between
> > them. Then you could use an agent system and the
> > programming would become very easy and safe, while
> > effortlessly making use of multiple processors. And no
> GIL
> > removal would be necessary.

There is a very nice solution that is also mentioned in Jourgen's blog and which i started to experiment with it some time ago: I am talking about parallel python (http://www.parallelpython.com/) that does the described job (distributing work on 2 or more cores) quite nicely.

Are you aware of this (beautifull) module ?


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 2:17 PM
Reply to this message Reply
Posted by: anthony boudouvas    Posts: 14 / Nickname: objectref / Registered: Jun, 2007
I would also think that such a module maybe a candidate to be included in next Python version and thus forget about the GIL for a long time...


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 2:12 AM
Reply to this message Reply
Posted by: Lila Saksa    Posts: 1 / Nickname: lilas / Registered: Feb, 2006
> > We do need some kind of solution, but it probably
> > shouldn't be threads. I think a process-based approach
> is
> > probably best.

I think there are various solutions out there but may not be very well known. Dr. Dobb's published for example an article on a "Python-based coordination system called 'NetWorkSpaces'"

http://www.ddj.com/web-development/200001971


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 5:19 PM
Reply to this message Reply
Posted by: Kevin Mitchell    Posts: 1 / Nickname: kamitchell / Registered: Sep, 2007
Bruce Eckels wrote:

> We do need some kind of solution, but it probably
> shouldn't be threads. I think a process-based approach is
> probably best. I'd like to see if it's possible to, from
> within one cpython instance, easily start up a second one
> in a different process and easily communicate between
> them. Then you could use an agent system and the
> programming would become very easy and safe, while
> effortlessly making use of multiple processors. And no GIL
> removal would be necessary.

I very much agree. Having each interpreter communicate is much better than having them share memory. Sharing means locking, and locking is expensive.

What if we make changes so that you could have multiple interpreters each in its own thread, sharing nothing? Each interpreter can keep its own GIL, and so there needs to be no fine-grained and expensive locking. There should be no performance change for the single-thread case.

This would work well for systems where process creation is expensive, or for embedding into programs that already start threads and would want to have a Python interpreter in many threads.

With a distributed object mechanism, calls could be made to objects in other threads, or serialized objects can be sent between threads.

A good remote-object-call mechanism that works between processes would also work between threads. Then end users could have the choice of using threads or processes to spawn off more interpreters.

I realize that this would place a new requirement on extension writers to lock or make extension globals thread-local. With Py3K coming out, this might be a good opportunity to suggest small changes in the way extensions are written for the new era.

It sounds like there just need to be a few things cleaned up in the thread state for the interpreter, in terms of shared small integers and single character strings.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 5:24 PM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> What if we make changes so that you could have multiple
> interpreters each in its own thread, sharing nothing? Each
> interpreter can keep its own GIL, and so there needs to be
> no fine-grained and expensive locking. There should be no
> performance change for the single-thread case.

Unfortunately, there are many data structures currently shared between interpreters, e.g. obmalloc (our custom super-fast small-block allocator), and immutable singleton objects like 0- and 1-char strings, the empty tuple, None, and all built-in exceptions, functions and classes. Having each interpreter have a separate None would require quite a bit of change in the VM.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 5:44 PM
Reply to this message Reply
Posted by: Ian Bicking    Posts: 900 / Nickname: ianb / Registered: Apr, 2003
Unfortunately, there are many data structures currently shared between interpreters, e.g. obmalloc (our custom super-fast small-block allocator), and immutable singleton objects like 0- and 1-char strings, the empty tuple, None, and all built-in exceptions, functions and classes. Having each interpreter have a separate None would require quite a bit of change in the VM.

All of those objects would be safe to share anyway, wouldn't they? Being immutable, they don't need locking, do they?


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 6:12 PM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> All of those objects would be safe to share anyway,
> wouldn't they? Being immutable, they don't need locking,
> do they?

Well, their reference counts still change, so you'd have to have to use a thread-safe reference count update macro *everywhere* (since Py_INCREF(x) doesn't know whether x could be None or not). Also, some objects have other invisible state, e.g. PyUnicode objects have an internal reference to their PyString rendition. You don't want to leak that.

All of this may not be insurmountable, but once it's all done I'm not sure I'd recognize the Python/C API, and extension writers would have to start from scratch (more so than with py3k I expect).


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 7:53 PM
Reply to this message Reply
Posted by: Matt Knox    Posts: 2 / Nickname: mattknoxca / Registered: Sep, 2007
> We do need some kind of solution, but it probably
> shouldn't be threads. I think a process-based approach is
> probably best. I'd like to see if it's possible to, from
> within one cpython instance, easily start up a second one
> in a different process and easily communicate between
> them. Then you could use an agent system and the
> programming would become very easy and safe, while
> effortlessly making use of multiple processors. And no GIL
> removal would be necessary.

For what it's worth (not much), I completely agree with Bruce (and the others who have voiced similiar opinions here). A standard built in module for doing inter-process communication and spawning such processes would be awesome. Sure, you can already do this stuff with any number of 3rd party modules, or hack up your own specialized solution, but the need for easy multi-core processing is such a fundamental thing with today's hardware that it would be a shame for Python 3.0 to not have such capabilities out of the box. The complaints are only going to get louder and more frequent on this topic going forward.

That being said, I love Python and have great respect for all the Python developers and the amazing things they have given us all for free. I am not in a position to champion such an addition to Python (I don't have the knowledge, talent, or ambition :P ), so I will just sit back and keep my fingers crossed that one day it will happen! Until then, mpi4py 4me.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 1:59 PM
Reply to this message Reply
Posted by: damjan georgievski    Posts: 1 / Nickname: gdamjan / Registered: Sep, 2007
This issue rises from time to time, because a lot of people are using python with Pylons or Django, and when they hit a performance barier they (understandably) hope putting the software on a shiny new quad cpu server will solve their problems. Only then they learn about the GIL and are frustrated by it.


OTOH, maybe a multi-process, shared-memory QUEUE imlpementation in stdlib will once for all defer this discussions?


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 2:37 PM
Reply to this message Reply
Posted by: Paul Boddie    Posts: 26 / Nickname: pboddie / Registered: Jan, 2006
I want to reply to lots of comments, but here goes with responses to two of them:

Ron Stephens writes: "Please do not become distracted by calls for feature Y or change Z or by criticism X."

I concur with Ron! I tend to err on harsh criticism of new features in Python as it is, but the open letter and surrounding "give me it" chorus from people who claim they have credentials (but don't seem to translate them into code) is not far short of posturing for posturing's sake. If they abandon Python for some other language over a check/tick in some feature box, ignoring Jython, IronPython, PyPy (and so on) in the process, then so be it; if their judgement is that impaired we shouldn't be listening to them anyway.

anthony boudouvas writes: "I would also think that such a module maybe a candidate to be included in next Python version and thus forget about the GIL for a long time..."

Look here for a collection of reasonable solutions: http://wiki.python.org/moin/ParallelProcessing

Some people have their own favourites, and I declare interest in one of those listed. A nice API that we can all get behind, plus a standard built-in solution, would be a good start, though.


How about removing GIL-dependency in the API? Posted: Sep 10, 2007 3:05 PM
Reply to this message Reply
Posted by: Jack Jansen    Posts: 1 / Nickname: jackjansen / Registered: Sep, 2007
Can I suggest a half-way point to removing the GIL (again:-)?

Currently the C API design is dependent on the GIL. For example, PyArg_Parse(..., "s", ...) requires the GIL, because otherwise the pointer returned might get garbage-collected before your code has actually looked at it.

By balancing all such calls (i.e. PyArg_ParseGILSafe() would return a cookie that you would eventually pass to PyArgParseGILSafeDone()) C developers could start writing extension modules that are GIL-safe.

(Probably superfluous explanation: the first call would incref all objects all objects to which borrowed references are returned, the second call would decref those. In a non-GIL-free interpreter a bit of preprocessor magic would make these calls do the current thing).

Offer: I'm still looking for a masters project, and while this is rather out-of-scope for my normal work I'd be willing to check whether anyone at the VU could be found to supervise this, but only if it stands a chance of getting accepted into the mainstream....


Re: How about removing GIL-dependency in the API? Posted: Sep 10, 2007 3:17 PM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> Can I suggest a half-way point to removing the GIL
> (again:-)?
>
> Currently the C API design is dependent on the GIL. For
> example, PyArg_Parse(..., "s", ...) requires the GIL,
> because otherwise the pointer returned might get
> garbage-collected before your code has actually looked at
> it.

Not really -- your *caller* is holding on to your arguments, and it won't be resumed until you return.

> By balancing all such calls (i.e. PyArg_ParseGILSafe()
> would return a cookie that you would eventually pass to
> PyArgParseGILSafeDone()) C developers could start writing
> extension modules that are GIL-safe.

I'd be worried about error returns forgetting to doing the cleanup, causing the GIL to be held forever (unless you modify the caller to release that lock if it's still held when your code returns).

> (Probably superfluous explanation: the first call would
> incref all objects all objects to which borrowed
> references are returned, the second call would decref
> those. In a non-GIL-free interpreter a bit of preprocessor
> magic would make these calls do the current thing).
>
> Offer: I'm still looking for a masters project, and while
> this is rather out-of-scope for my normal work I'd be
> willing to check whether anyone at the VU could be found
> to supervise this, but only if it stands a chance of
> getting accepted into the mainstream....

*This* particular approach doesn't sound quite right, but a project to add GIL-free threading to Python might work. I recommend looking beyond just getting rid of the GIL while keeping the existing thread/threading API though; Bruce Eckel's suggestion of introducing a new API for dealing with threads (actor-based?) might be more promising (and is more likely to find a sponsor amongst your professors :-).


Re: How about removing GIL-dependency in the API? Posted: Sep 10, 2007 3:49 PM
Reply to this message Reply
Posted by: anthony boudouvas    Posts: 14 / Nickname: objectref / Registered: Jun, 2007
Ok, Guido told us to propose some ideas but i fail to see why some of them that mentioned here are not even further discussed...

I insist, why not incorporate in the language distribution the "parallel python" module ? It is something that works, is stable and can distribute work load to n-processors, even on other computers.

If removing GIL is so difficult, what is technically the reason not adopting the above (pp) solution ?? (or some other solution similar to it)


Re: How about removing GIL-dependency in the API? Posted: Sep 10, 2007 5:58 PM
Reply to this message Reply
Posted by: Guy Kloss    Posts: 2 / Nickname: xemacs / Registered: Sep, 2007
> I insist, why not incorporate in the language distribution
> the "parallel python" module ? It is something that works,
> is stable and can distribute work load to n-processors,
> even on other computers.

I second that!

Using PP is extremely easy and follows the nice ease of use Python is well known for. Only one thing for parallel processing along the lines of this thread's discussions is missing: Thread/process communication. For something to go into Python there needs to be some mechanism to communicate between the processes (as in the case of PP). Maybe something along the line of channels as done in Stackless, maybe somehow as "lived" in ProActive within the Java world, ...

So far PP is more along the lines of submit and retrieve (final) results. But I just love it.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 4:07 PM
Reply to this message Reply
Posted by: Ian Bicking    Posts: 900 / Nickname: ianb / Registered: Apr, 2003
Can I suggest something just like os.fork() (except implemented directly in CPython) would be incredibly great...? Shared memory, fast interpreter creation, sharing special kinds of objects over explicit channels (e.g., open files or sockets)... that would be wonderful.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 5:08 PM
Reply to this message Reply
Posted by: S Deibel    Posts: 9 / Nickname: sdeibel / Registered: Apr, 2003
It's all well and good to say something should be done, but it's unrealistic to think Guido or others are suddenly going to leap on the idea after years of discussion and at least one proof against the concept.

This is the open source world where I think "put up or shut up" is a good standard to live by. In other words, if you have a good idea, bring it up when you're ready to actually work on it, write and release actual code yourself, try things out, and get others to test and contribute ideas and code. Then you're on your way to proving the concept and refining things until eventually your code could be accepted as a patch.

But personally, I wouldn't bother with this. I think threading is a bad idea for most things and that the GIL's advantages outweigh its disadvantages most of the time.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 7:11 PM
Reply to this message Reply
Posted by: Leonardo Santagada    Posts: 6 / Nickname: santagada / Registered: Jul, 2006
Including PP is a start. Channels and Stackless is another (maybe complementary) step. Saying "use os.fork()" is not a good answer. What I mean is that we should have a good answer to threads in the core of python, not a do-it-yourself way (which defeats the batteries included motto).

Maybe the only real answer is not using refcount at all, have a real good garbage collector and have a lesser dependency on c libs, which means pypy... but that is going to take some time still.

All in all the thing that most people is saying is that new computers have almost all more than one core (or processor), and that python should be prepared for it, not that it is slower than java and with that it will be faster.


Re: It isn't Easy to Remove the GIL Posted: Sep 10, 2007 8:15 PM
Reply to this message Reply
Posted by: Bartek Jarocki    Posts: 2 / Nickname: bartek / Registered: Feb, 2006
Hi

I am not much concerned about CPython GIL, there are different languages which maximize computing efficiency also in many scenarios fork/RPC can help scaling much better then (evil..) threads. Just a thought though.. perhaps naive

Removing GIL is difficult and past results contradict purpose so what if instead the interpreter itself was parallelized and started to scale on duos, quads and whatever comes next (even with single threaded Python program!).

Is it something potentially possible with (C)Python?

Bartek


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 1:26 AM
Reply to this message Reply
Posted by: anthony boudouvas    Posts: 14 / Nickname: objectref / Registered: Jun, 2007
so what if instead the interpreter itself was
> parallelized and started to scale on duos, quads and
> whatever comes next (even with single threaded Python
> program!).

I think that you need one interpreter for each core for this to work. We have posted here some solutions that are indeed working right now and i desperately want to hear Guido's opinion on that...


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:00 AM
Reply to this message Reply
Posted by: Guido van van Rossum    Posts: 359 / Nickname: guido / Registered: Apr, 2003
> We have posted here some solutions that are
> indeed working right now and i desperately want to hear
> Guido's opinion on that...

I'm not familiar with any of the solutions proposed and it will take me some time to become sufficiently familiar with some of them before I can judge them. The last thing I want to do is rush something into the standard library without understanding what problem it solves, how stable it is, etc.

But (as Bruce seems to be understanding better than some of the commenters on his or this blog) before we rush to a solution we need to look more at the problem.

I'm not at all sure that the people crying for GIL removal and the people asking for more concurrency support are after the same thing at all. GIL removal is the "obvious" solution if you already have a multi-threaded program, like a web server: All web servers need some form of concurrency, and most solve it through threads, as it is the most convenient solution.

You can get quite far using the one-thread-per-request model (I think frameworks like Django and TurboGears/Pylons use this), since any individual web thread is typically I/O-bound: first you have to wait until the entire request is received, then you wait for your database, finally you wait until the client has received the last byte of your request. By the time your server is no longer I/O-bound but CPU-bound, you have likely hit upon a successful web concept, and the last thing you want to do is have to rethink everything in order to speed it up. So GIL removal sounds attractive. (It also helps that most databases already address the problem of concurrent access in one way or another, so this won't be a stumbling block.)

But I've got a feeling that Bruce isn't thinking of this scenario when he asks for actors (which I remember him bringing up in 2001-2003, so at least he's consistent :-). Unfortunately I can't quite think what problem area he wants to address. There are many different ways one can use multiple CPUs to make a given algorithm faster, but it depends a lot on the algorithm how you have to code it to benefit. E.g. I believe that in the numpy world, GIL removal is pretty much a non-issue: all their heavy lifting is done by C, C++ or Fortran code, which can easily benefit from multiple CPUs by using special vectorizing operations or by creating OS-level threads that aren't constrained by the GIL (since they don't touch Python objects, only arrays of numbers).

At Google we use a little hack called MapReduce, which tackles a different problem space again: algorithms involving massive amounts of data, so massive that they may not fit on a single computer's hard drive. Yet its abstraction is powerful enough to efficiently describe less resource-hungry algorithms as well -- as long as they are highly (embarrassingly) parallelizable.

Then there are models like Linda. Etc., etc. There are numerous other styles of concurrent problems as well, and I don't think a single solution can be made to fit all.

In CPython's specific case, I believe that an additional constraint is that it is often used as a *glue language*. This means that any solution must continue to have great support for writing extension modules in C/C++ and linking against libraries written in those languages. A solution that is to find widespread acceptance must continue to support OS-level threads and file descriptors in some way that makes it easy to bounce between the Python and C/C++ layers. Python (at least CPython) mustn't lose the ability to talk to databases, fork subprocesses and talk to them using pipes, link to GUI libraries that have their own event loop, or use system libraries that may have their own concurrency constraints.

One application at Google with which I'm rather intimately familiar uses the following forms of I/O, wrapped in various extension modules:

- sockets
- mysql
- Bigtable (an internal Google database)
- file system via NFS
- SSH to remote workstations
- Perforce API (a C++ library for which we have no source)

I'd like to continue to be able to use Python for such applications!


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:23 AM
Reply to this message Reply
Posted by: anthony boudouvas    Posts: 14 / Nickname: objectref / Registered: Jun, 2007
> I'm not familiar with any of the solutions proposed and it
> will take me some time to become sufficiently familiar
> with some of them before I can judge them. The last
> thing I want to do is rush something into the standard
> library without understanding what problem it solves, how
> stable it is, etc.

Ok, i completely understand this. But if you find the time, have a look at these proposals, i think that it will worth the effort.

> I'm not at all sure that the people crying for GIL removal
> and the people asking for more concurrency support are
> after the same thing at all.

As i said, i have started to work with one of these proposals we made here ("parallel python") and i see that it is working very-very nicely without being disturbed by the GIL. So, i think that if we have THAT luxury then there is no reason to spent time to remove the GIL.

I do not know of course your everyday work-schedule but there are modules out there that they worth your time to have a look at them. For example, i used psyco the other days and i was in complete shock when i figured out the performance improvements that it can gave us in some circumstances...


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 8:35 PM
Reply to this message Reply
Posted by: Michel Alexandre Salim    Posts: 4 / Nickname: salimma / Registered: Sep, 2005
> But I've got a feeling that Bruce isn't thinking of this
> scenario when he asks for actors (which I remember him
> bringing up in 2001-2003, so at least he's consistent :-).
> Unfortunately I can't quite think what problem area he
> wants to address.

http://lamp.epfl.ch/~phaller/doc/ActorsTutorial.html

As I undestand it, it allows concurrency with a lower overhead than multi-threading. Would be a nice feature to have in Python -- I wonder if it can be implemented as a library (the Scala implementation is).

By the way, Guido, the e-mail from Greg that you cited didn't claim a slowdown of 2x -- the slowdown was from 0.95 unit to 0.6, which is more like, what, 35%? Though Greg also noted that even at the time of writing, Python has started using more and more global structures which might mean an updated patch would perform worse.


Re: It isn't Easy to Remove the GIL Posted: Sep 16, 2007 1:23 PM
Reply to this message Reply
Posted by: Bruce Eckel    Posts: 875 / Nickname: beckel / Registered: Jun, 2003
> But I've got a feeling that Bruce isn't thinking of this
> scenario when he asks for actors (which I remember him
> bringing up in 2001-2003, so at least he's consistent :-).
> Unfortunately I can't quite think what problem area he
> e wants to address. There are many different ways one can
> use multiple CPUs to make a given algorithm faster, but it
> depends a lot on the algorithm how you have to code it to
> benefit.

Again, started writing a reply to this and ended up with another weblog entry:
http://www.artima.com/weblogs/viewpost.jsp?thread=214627


Re: It isn't Easy to Remove the GIL Posted: Sep 11, 2007 1:09 AM
Reply to this message Reply
Posted by: Joachim König    Posts: 1 / Nickname: joachim / Registered: Sep, 2007
> I want to point out one more time that the language doesn't
> require the GIL -- it's only the CPython virtual machine
> that has historically been unable to shed it.

Does that mean that jython and ironpython are "better" in this regard?
Then someone could at least demonstrate what
CPython might gain when the GIL is removed.

Are there any useful python code examples that use threads
that run considerably faster in the jython or the .net VM,
on UP (uniprocessor) and MP (multiprocessor) systems?


Re: It isn't Easy to Remove the GIL Posted: Sep 14, 2007 5:21 PM
Reply to this message Reply
Posted by: Mike Sandman    Posts: 6 / Nickname: n8behavior / Registered: Feb, 2004
> Does that mean that jython and ironpython are "better" in
> this regard?

has it been mentioned (did i miss it?) if jython or IP scale fine on multi-core systems. the obvious answer would seem to be yes, since this is an issue of the VM, not the language.


Re: It isn't Easy to Remove the GIL Posted: Sep 15, 2007 2:37 PM
Reply to this message Reply
Posted by: Guy Kloss    Posts: 2 / Nickname: xemacs / Registered: Sep, 2007
This also sounds quite interesting on that account using Transactional Memory. Doing optimistic execution, and then sorting out conflicts. After all, in many (most?) cases conflicts are rather not the common execution path ...

"""
One of the workloads that we've tried to parallelize with transactions is the Python interpreter. While the Python programming language provides threads as part of the language, its canonical implementation (CPython) provides only limited concurrency as bytecode execution is performed while holding a ``global interpreter lock.'' With the expectation that the bytecodes would likely be embarrassingly parallel, one of my students undertook replacing this overly conservative concurrency control mechanism with the optimistic execution and fine-grain conflict detection provided by wrapping each bytecode with a (hardware) transaction. Our lessons learned were two-fold: 1) it was remarkably easy to eliminate the false conflicts resulting from the interpreter's use of global variables, and 2) it was remarkably hard to correctly deal with the bytecodes (and therefore the transactions) that resulted in system calls and I/O, in part because they may be encapsulated in native code. In hindsight, we recognized that exposing Python's concurrency would have been much easier by programming to a hardware abstraction that speculatively executed lock-based critical sections (i.e., Speculative Lock Elision (SLE)), which would have transparently handled the system call and I/O issues correctly. We believe that this conclusion likely generalizes to many of the workloads that have been executed on TM prototypes.
"""

source: http://www-sal.cs.uiuc.edu/~zilles/tm.html


Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2007 10:49 PM
Reply to this message Reply
Posted by: Adam Olsen    Posts: 11 / Nickname: rhamph / Registered: Jun, 2007
I have been experimenting with GIL removal for a while now. I've avoided publicly posting to avoid the inevitable bikeshed discussions, but that seems pointless now, doesn't it? ;)

Before you consider performance you need to decide what sort of semantics you want. On my wishlist is easy spawning of threads, "sanely" handling exceptions, gracefully killing threads when the user doesn't handle an exception, and cheap sharing of read-only data. I have a lot of plans for how to do this, but the development schedule for 3.0 is already full, so I'm not going into them until 3.1 development starts.

The requirement to cheaply share read-only data eliminates process models. They may work for some tasks, but they're just too coarse for many things.

Now, on to the performance. I'm going to assume there's another scheme in place to prevent concurrent modification of dicts and the like (as my plans use), and that refcounts are the only issue. So, what are our options for safely handling reference counts?

Atomic refcounting via inline assembly is the most obvious solution. I've modified python to use these and found only 12% overhead to existing code. Sounds perfect, right? Wrong! It's 12% *uncontended*. Once you have two threads modifying the same reference count the costs skyrocket! Two threads becomes slower than a single thread. It only gets worse as you add more cores, so ultimately this is unviable.

Another option would be to use a traditional tracing GC. There's simply too much code in CPython to rewrite it all though. The Boehm-Demers-Weiser conservative GC *might* work, but I don't like how it might interact with arbitrary C libraries (nevermind if python is loaded after the rest of the app!) Also, python uses a lot of very short lived temporaries, so the memory costs could be very high with a tracing GC.

The third option, which I'm currently pursuing, is an odd sort of hybrid. The refcount for an object starts out "owned" by a single thread. So long as you are that thread, you can update the refcount directly (although the memory barriers and branches are still more expensive than traditional refcounting.) If you don't own the refcount, you ask the owner to promote the object to "asynchronous" refcounting. This means that all future refcount operations go through a hashtable and get flushed to memory in batches. Dropping to 0 is not checked until later when the tracing GC runs. I estimate this has a cost of around 30%, but the important thing is that (due to the batching) it's scalable! Switch from 4 core to 64 core and your performance could go up all the way.

Okay, back to reality. I am working on a patch/fork that includes the semantics on my wishlist and uses the hybrid refcount/tracing GC. When 3.1 development starts I plan to propose the semantics and (as a compile-time option!) the performance-affecting aspects. However, the cost so far is a lot more than 30%!

Unmodified py3k (revision 57858):
28000 pystones/second
Modified, single thread (dual core box):
11500 pystones/second
Modified, two threads (dual core box):
22800 pystones/second

I think the bulk of the difference is due to de-optimization. I've had to disable free lists, pymalloc, etc to get this far. Hopefully competitive replacements can be found in time.

Anyway, I do plan to maintain this patch/fork for a while (but I'm not posting until 3.0 is out the door! If you want to help, help 3.0!)


Re: It isn't Easy to Remove the GIL Posted: Sep 13, 2007 5:03 PM
Reply to this message Reply
Posted by: Jesse Noller    Posts: 3 / Nickname: jnoller / Registered: Sep, 2007
Hey Adam, I've started examining alternatives and benchmarking/mocking them up - I would love to talk to you more and maybe even try out your patches. If you're interested, shoot me an email at j-noller at gmail dot com


Re: It isn't Easy to Remove the GIL Posted: Sep 14, 2007 4:53 PM
Reply to this message Reply
Posted by: Adam Olsen    Posts: 11 / Nickname: rhamph / Registered: Jun, 2007
> Unmodified py3k (revision 57858):
> 28000 pystones/second
> Modified, single thread (dual core box):
> 11500 pystones/second
> Modified, two threads (dual core box):
> 22800 pystones/second

For those keeping score, my current numbers:
Unmodified py3k (revision 57858):
28000 pystones/second
Modified, single thread (dual core box):
18800 pystones/second
Modified, two threads (dual core box):
36700 pystones/second

My most recent change was to give each "asynchronous" object an index into a per-thread table of refcount-changes. This was precipitated on realizing my thread-pystones benchmark shares less than 500 objects. It's not clear what the costs would be to flush such a table when it becomes full, so it may not be as practical as a hash table.

Most of my performance improvements have come from tweaking INCREF/DECREF. I guess my earlier prediction that they were only 30% of the performance reduction was wrong. It's not clear how much of a hit glibc's malloc is.


Re: It isn't Easy to Remove the GIL Posted: Nov 23, 2007 6:39 PM
Reply to this message Reply
Posted by: Myroslav Opyr    Posts: 1 / Nickname: myroslav / Registered: Nov, 2007
I was pointed to a good writeup upon Hardware/Software overview of tendencies in multi-core computing. It sheds some light upon progress made in the area touched by Guido in different languages/systems/platforms.

http://notes-on-haskell.blogspot.com/2007/10/defanging-multi-core-werewolf.html


Re: It isn't Easy to Remove the GIL Posted: Jan 1, 2008 2:19 PM
Reply to this message Reply
Posted by: Bernd Will    Posts: 1 / Nickname: berndwill / Registered: Jan, 2008
Hello everybody,

I think, the GIL shouldn't be removed, since many modules are expecting this cpython based technology implicitely. But I think it is a lack in the language itself, if the language can only control one core CPU, if more CPU's are available. Maybe there could be more built in functionality for parallel algorithms / codes spreading threads onto several CPU's while one central CPU process could control them ?

Then, GIL would be for single threaded robust processes, while CPU multitasking /multithreading functionality controlling CPU's would be an extra benefit.

Regards
Bernd


Re: It isn't Easy to Remove the GIL Posted: Jun 12, 2008 8:24 AM
Reply to this message Reply
Posted by: Ray Horn    Posts: 1 / Nickname: raychorn / Registered: Jun, 2008
What about simply making each Python OS Thread use a separate Python Namespace ? (See also: http://python2.near-by.info) for the details.


Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2008 5:11 PM
Reply to this message Reply
Posted by: Patrick Stinson    Posts: 2 / Nickname: patrickkid / Registered: Sep, 2008
We have made an interesting case along the lines of multi-threaded audio applications (meaning more than one audio thread). I have described it on my blog here:

http://pkaudio.blogspot.com/2008/07/multiple-rt-threads-and-gil.html

Advantages to us for removing or migrating the GIL thus allowing us to use threads instead of processes:

- Low startup speed overhead
- Low long-term memory footprint
- Easy debugging (very important)
- Being nice to the host sequencer app (they don't expect many processes)

Since we don't use extension modules and therefore have more need for the language than the entire VM platform, the problem becomes more the execution environment instead of the algorithmic environment. Since we run as a plugin in many host apps we should ideally run a light-weight thread to do audio compilation. We want to be able to script some control-rate computation, and are never allowed to block in the audio thread.

interesting problem, really.


Re: It isn't Easy to Remove the GIL Posted: Sep 12, 2008 5:12 PM
Reply to this message Reply
Posted by: Patrick Stinson    Posts: 2 / Nickname: patrickkid / Registered: Sep, 2008
We have made an interesting case along the lines of multi-threaded audio applications (meaning more than one audio thread). I have described it on my blog here:

http://pkaudio.blogspot.com/2008/07/multiple-rt-threads-and-gil.html

Advantages to us for removing or migrating the GIL thus allowing us to use threads instead of processes:

- Low startup speed overhead
- Low long-term memory footprint
- Easy debugging (very important)
- Being nice to the host sequencer app (they don't expect many processes)

Since we don't use extension modules and therefore have more need for the language than the entire VM platform, the problem becomes more the execution environment instead of the algorithmic environment. Since we run as a plugin in many host apps we should ideally run a light-weight thread to do audio compilation. We want to be able to script some control-rate computation, and are never allowed to block in the audio thread.

interesting problem, really.


Topic: Should Microsoft Buy Yahoo? Previous Topic   Next Topic Topic: Development Management: Carthorse, Racehorse, or Wild Horse?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use