The Artima Developer Community
Sponsored Link

Python Buzz Forum
My views on OpenMP

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Andrew Dalke

Posts: 291
Nickname: dalke
Registered: Sep, 2003

Andrew Dalke is a consultant and software developer in computational chemistry and biology.
My views on OpenMP Posted: Jan 15, 2012 8:04 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Andrew Dalke.
Original Post: My views on OpenMP
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
Latest Python Buzz Posts
Latest Python Buzz Posts by Andrew Dalke
Latest Posts From Andrew Dalke's writings

Advertisement

In private email a correspondent observed that OpenMP makes threading very easy, but "it really seems under utilized in the community." (Here, 'community' is 'scientific programming.') I was surprised to find out that I had strong views on the topic.

OpenMP sits between several other pieces of technology, being:

  • GPU computing
  • cloud computing
  • POSIX and other common threading libraries

The new hotness is GPUs. Wes Faler gave a presentation at the recent 28th Chaos Communication Congress on Evolving Custom Communication Protocols. He mentioned they ported C++ code over to the GPU. The unoptimized version was 7 times slower on the GPU than the CPU. However, they do many evaluations using the same function, and because there are so many compute threads in the GPU, the overall time was a factor of 7 faster. Similarly, Haque et al. showed that a 4 core desktop machine, properly tuned, was "only" about 5x slower than a GPU card.

It looks like GPU computing is currently the approach to take if you do a lot of evaluation of similar tasks, assuming you have the GPUs and programming time available. That performance (and the novel way of computing) interests people who might otherwise use OpenMP.

Cloud computing is another hotness. Alex Martelli was recently interviewed by Larry Hastings in Radio Free Python episode #2. At 33:47 Larry asked about Python's global interpreter lock and Alex's reply was:

I hate threading anyway. Multiprocessing is the way to go, and message-passing, not shared memory. That just doesn't scale. I use multithreading so I can use all of my 16 cores, or whatever is the average number of cores in a machine these days. Big furry deal. I've got a few thousand servers waiting for me in the data center and how do I use those with threading?
The topic comes up several times in the ensuing discussion.

What good indeed is OpenMP, which might be used for a 16 node machine, if you're working on problems which involve 10,000 distributed servers?

Even single nodes have multiple cores these days, and a good OpenMP implemenation might help make good use of the nodes in that cloud. However, you have to compare OpenMP to traditional POSIX multithreading. OpenMP works for C/C++ and Fortran, but not for Python nor (it seems) Java, nor other languages which support pthreads. You're out of luck if you want to use OpenMP with one of those other languages.

Some things scale up wonderfully well by adding one or two OpenMP directives, but parallelism is rarely as trivial as giving a few hints to the compiler. I think that the non-trivial cases of parallelizing with OpenMP are about as much work as using pthreads, or a system like Grand Central Dispatch. I'll work through an example of doing that in my next essay.

I do believe that OpenMP scales better than these alternatives for some cases, in part because the compiler is doing the work rather than using a library API. My tests so far show that pthreads and OpenMP have about the same scaling with two processors, and I need four or more cores to show a strong OpenMP advantage.

Most desktop/laptop computers just don't yet have 8+ cores. (Alex Martelli said otherwise, but perhaps he's talking about Google's data centers.) Most people develop for their own computers, which lessens the incentive to work on good multicore scaling.

I have a four-core machine, and I'm willing to write a Python extension in C which uses OpenMP. Even then I've run into some difficulties. It took a while but I figured out how to configure Python's setup.py so it includes the right "use OpenMP" flag for each compiler. It includes a hard-coded list of compilers which do and do not support OpenMP. Also, did you know that on a Mac you must run OpenMP tasks in the main thread, and not in a pthread? Otherwise your program crashes; even when you have a single OpenMP thread! I had to figure out a workaround so I could use my library unchanged inside Django.

People are interested in OpenMP development, but some who might use OpenMP are drawn to other technologies. Some tasks are very appropriate for OpenMP, but they are almost as appropriate for other, more common technologies. OpenMP scales well, but most people don't have the hardware where OpenMP shines. Even when they do, they have to work in one of a handful of languages, and in somewhat restricted circumstances.

All these contribute to diminishing OpenMP utilization in the community.

Read: My views on OpenMP

Topic: I parallelize an algorithm Previous Topic   Next Topic Topic: OpenMP vs. POSIX threads

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use