Re: It isn't Easy to Remove the GIL
Posted: Sep 17, 2007 11:10 AM
> You're right. I apologize. Your post pushed my sarcasm
> button and I should have reworked it before posting it.
It's ok, don't worry. I'm gladly accepting your apology.
When someone greatly cares about a topic then it is often difficult to keep the discussion free of any emotions. You obviously care and know very much about the topic of threading, which makes this understandable.
I like to avoid conflicts when I can, and I'm happy about your graceful response, which helps us all to get back to the basics of this discussion.
> The thing that pushed the button is hearing "but threading
> just isn't that hard" one too many times. It took me a lot
> of study and experimentation -- years -- to begin
> realizing that threading really is that hard. And there
> were a number of periods where I thought I actually
> understood it, so I need to be more understanding when
> someone else is in one of those phases as well.
Ok. It's interesting to be described as being 'in a phase', but Ok. :-)
> I have written giant chapters about it both in C++ and in
> Java and that may have convinced some people of the
> complexity. I probably need to write a much shorter
> chapter or article that can somehow make the case that
> threads in general are not the right solution. But it's
> not easy and people who think threads are easy are
> probably not going to be convinced. Threads are
> fascinating and they draw you in with the feeling that if
> you can just put one more locking mechanism into the
> system then threads will work OK. Not unlike "just one
> more static typing mechanism" in the static-vs-dynamic
While I haven't written books about this, and also wouldn't say that I have studied threading for years, I would say that I have written multi-threading programs on a number of different platforms for many years now.
I don't believe for a moment that I am some sort of special threading superstar. I'm not. I have had my fair share of deadlocks, head-scratching, hard-resets on machines with locked kernel threads and so on. I can tell you that. I have written multi-threaded programs that worked 'really well' ... until I actually tried to run them on an SMP machine.
The scariest thing is when you get a memory corruption in your shared data structure and there doesn't seem to be any decent debugging tool available that tells you what's going on. Valgrind informs you where the corruption occurs, but you just can't see 'why' it happens, because it should be 'impossible'. Then you sit with your team at the table, staring at the print-out of the code and just hope that someone suddenly says 'Aha!' and points at the one line of code where your intricate locking falls apart. And if you don't get that 'Aha!' moment you will be royally stuck...
I've been through all of this plenty of times. I guess that is why I don't see myself as being particularly naive about the topic.
> We do have a fundamental disagreement, though. I have come
> to the conclusion that shared-memory concurrency is
> impossible for most programmers to get right, and you feel
> that threads are a reasonable solution for a significant
> class of problems, and that you are able to get it right
> when you need to.
Yeah, well, sometimes it was a pretty significant struggle, as you can see from what I wrote above. There were moments when we weren't so sure whether we would ever 'get it right'. :-)
I think that a shared-nothing approach, in which all communication between 'threads' (or processes) takes place via message queues definitely makes it easier to arrive at a correct program. You can program with this model quite easily, even using the normal threading API. I have had several problems where this was exactly the right approach to take. If this model is appropriate for a problem then one should stick to it.
Maybe out of sheer (mis)fortune, though, I had to work on several projects where this approach didn't work in all cases. Sometimes, a legacy system with a large, 'central' data structure needed to be made to take advantage of multiple CPU cores, and so the work had to be broken out across threads without having to re-write or re-architect the core of the system.
In other cases, the nature of the shared data didn't lend itself to being communicated via messages. The overhead of doing so would have been prohibitive. For example, some large tree. You have locks on certain branches, or even the individual nodes at times, which allows multiple threads to work at the same time in the tree. This is very difficult to get right, but the performance requirements (and other considerations) pretty much mandated that this was the approach that had to be taken.
Once something like this has been wrestled to the ground and has been made to work, the advantage is that all data can be modified in place. No additional copying (for messages) is necessary, no marshalling and unmarshalling when sending data, and so on. Sure, you need to consider the benefit of in-place modification if your individual cores have individual memory caches, but it still can often work out to your advantage.
So, I guess my point is this: Yes, avoid the shared memory if you can, but realize that sometimes it still is the right approach. The current threading API (all GIL issues aside) allows me to write shared-nothing, message-based systems. But it also allows me to use locks and shared data to my hearts content. That's why I like the current threading API. It gives me a choice. And sometimes, I need to make the (hopefully) informed choice to bite the bullet and go for the shared memory approach.
I'd be happy if all documentation chapters that mention multi-threading and shared memory come with a warning label, similar to cigarette packages...
> I shouldn't have replied sarcastically, but your reply
> seemed to suggest that it was obvious that most
> programmers could write correct threaded programs.
I hope that my reply clarified my view on this: Writing correct threaded programs with shared memory is not obvious, and it's not always easy. Not for many good programmers and definitely not for me either.
> I guess that frustrates me because I don't know how to explain the difficulties of threading well enough.
May I humbly suggest that in your next article or book about this you relay the experience I described above: Looking with your team at the print-outs, not knowing if you will ever find the problem with your program? In moments like this you can easily get that terrible, sinking feeling, especially if you are supposed to check in the 'fix' for the deadlock problem tonight for the upcoming release. It's already 6.30 pm and you have no idea when or even if you will ever fix it.
Of course, there are also plenty of other areas in software development where you can get just as stuck. But the potential for this certainly is there when you are dealing with shared memory multi-threading.
"Warning: Shared data multi-threading ahead. Proceed with caution!"
Sometimes I still have to proceed.