This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: 100,000 tasklets: Stackless and Go
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
People are talking about Google's Go language. It supports
concurrency, which here means light-weight user-space threads and
inter-thread communications channels. In Stackless terms these are
"tasklets" and "channels."
In Rob Pike's
Google Tech Talk on Go he shows an example which builds 100,000
"goroutines" (their tasklest, and a pun on the word 'coroutine') and
does a simple operation in each goroutine. It's at 43:45 into the
video. I've transcribed it here by hand, so this might contain typos:
package main
import ("flag"; "fmt")
var ngoroutine = flag.Int("n", 100000, "how many")
func f(left, right chan int) { left <- 1 + <-right }
func main() {
flag.Parse();
leftmost := make(chan int);
var left, right chan int = nil, leftmost;
for i:= 0; i< *ngoroutine; i++ {
left, right = right, make(chan int);
go f(left, right);
}
right <- 0; // bang!
x := <-leftmost; // wait for completion
fmt.Println(x); // 100000
}
On the Stackless list, Richard Tew commented:
They have an nice example where they chain 100000 microthreads each
wrapping the same function that increases the value of a passed
argument by one, with channels inbetween. Pumping a value through the
chain takes 1.5 seconds. I can't imagine that Stackless will be
anything close to that, given the difference between scripting and
compiled code.
I was curious so I wrote something up which suggested that Stackless
Python was a lot faster than Go for this task. That was on a benchmark
of my own devising, based on reading Richard's comment. Yesterday I
finally tracked down the code and wrote a direct translation into
Stackless:
import stackless
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-n", type="int", dest="num_tasklets", help="how many", default=100000)
def f(left, right):
left.send(right.receive()+1)
def main():
options, args = parser.parse_args()
leftmost = stackless.channel()
left, right = None, leftmost
for i in xrange(options.num_tasklets):
left, right = right, stackless.channel()
stackless.tasklet(f)(left, right)
right.send(0)
x = leftmost.receive()
print x
stackless.tasklet(main)()
stackless.run()
It's a bit longer and more verbose than Go because there's no
syntactical support for the Stackless additions to Python. Question
is, how long does it take to run?
My reference is from when Pike did:
wally:~ r$ goc goroutine.go
wally:~ r$ 6.out
100000
wally:~ r$ time 6.out
100000
real 0m1.507s
user 0m0.875s
sys 0m0.626s
wally:~ r$
He jokingly apologized for how long the 'goc' step took, which got a
titter because it was quite fast, even on a "little Mac here, it's not
very fast."
Let me reproduce that timing test in Stackless:
josiah:~/src dalke$ uptime
14:01 up 2 days, 13:53, 4 users, load averages: 0.46 0.68 0.74
josiah:~/src dalke$ time spython go_100000.py
100000
real 0m0.655s
user 0m0.512s
sys 0m0.136s
josiah:~/src dalke$
("spython" is the name of my local installation of Stackless.) Now, I
did my tests on a 2.5 year old MacBook Pro. That should be comparable
to his Mac. My Stackless example was faster in every way than the
reference Go code, and that includes the code of Python parsing the
.py file and compiling it to byte-code.
Remember, Go is a compiled language designed for concurrency. I'm
working with the C implementation of Python, which compiles only to
byte codes, not machine code, and which was not designed for
concurrency. Yet the Python code is faster.
Why then does Pike sound so proud about the performance of Go on this
timing test? I don't know.
One other observation about Go. Pike's video and another I watched
stressed the fast compile times. It seems modern development bogs down
doing compiles. They gave numbers: 1,000 lines of Go code in 0.2
seconds on some sort of Mac. I took my copy of
Python 0.9p1,
at 25,000 lines of C code. It compiled in 7 seconds on my MacBook Pro
laptop. Scaling up, the same number of lines of Go code would compile
(assuming we have identical machines) in 5 seconds. If I compile with
CFLAGS=-g then Python compiles in 2.75 seconds.
Since I'm not testing this on the same machine as them it's hard to
say anything concrete, but I would be suprised to find out that my
laptop was twice as fast as theirs, which it would have to be to make
my numbers be worse than what they report for Go. Plus, I've heard
the Intel compilers are a lot faster than gcc.
Something doesn't make sense here. Why do they tout Go's performance
both for goroutine message passing and for compilation as exceptional?
The timings seem worse than existing comparables.