This post originated from an RSS feed registered with Ruby Buzz
by Eric Hodel.
Original Post: Net::HTTP is not slow
Feed Title: Segment7
Feed URL: http://blog.segment7.net/articles.rss
Feed Description: Posts about and around Ruby, MetaRuby, ruby2c, ZenTest and work at The Robot Co-op.
Some time back there was a blog post about Net::HTTP being slow, but that's not true anymore, and probably wasn't as true then as it was claimed to be.
The way to make Net::HTTP go fast is to use a persistent connection so you don't have to re-connect to the server every time. Unfortunately the original benchmarks referenced above don't seem to make more than one request per implementation so Net::HTTP couldn't give its best possible showing.
If you're doing a one-off file transfer or only fetching content from one site at a time it's ok to avoid Net::HTTP for another library. If you're requesting data from the same server over and over, like a web service, it's nearly immoral to connect to it over and over.
In order to help you use Net::HTTP the right way I've released net-http-persistent. It's a thread-safe wrapper for Net::HTTP that performs persistent connections for you. Here's an example:
require 'net/http/persistent'
uri = URI.parse 'http://example.com/awesome/web/service'
http = Net::HTTP::Persistent
response = http.request uri # performs a GET
# perform a POST
post_uri = uri + 'create'
post = Net::HTTP::Post.new uri.path
post.set_form_data 'some' => 'cool data'
response = http.request post_uri, post # URI is always required
net-http-persistent is incredibly tiny, so maybe you can add some convenience methods to it. I haven't had a need to.
Benchmark
I wrote the following three benchmark blocks to return the same
request body for a URL I’m sure will work (return 200 OK with a
payload). A static file was used to minimize server processing latency.
Each iteration:
sends an HTTP request
cleans up after itself (to be friendly to the network)
extracts the body
Loopback
When running across loopback with all three benchmarks I received the
following result with N=20_000 using uri_2k:
With N=50_000 and the Net::HTTP benchmark disabled:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
Rehearsal ---------------------------------------------------------
TCPSocket 3.290000 5.340000 8.630000 ( 24.503025)
Net::HTTP::Persistent 20.090000 2.160000 22.250000 ( 28.822468)
----------------------------------------------- total: 30.880000sec
user system total real
TCPSocket 3.290000 5.340000 8.630000 ( 23.874741)
Net::HTTP::Persistent 20.100000 2.150000 22.250000 ( 29.188237)
So raw TCPSocket is about 20% faster than Net::HTTP::Persistent.
This was expected as the initial connection setup and teardown round-trips
will be very fast on the loopback interface which gives Net::HTTP::Persistent the worst-possible showing.
Unfortunately you miss out on easy error checking and all that other
Net::HTTP and Net::HTTP::Persistent goodness using TCPSocket.
Real Internet
Depending upon your link speed, creating TCPSockets across the Real
Internet may drastically reduce the performance of TCPSocket.
This benchmark was run with N=500 from my home internet connection and
uri_2k. traceroute shows 16 hops between the client and server. At the
time of the benchmark run ping -c 20 showed:
20 packets transmitted, 19 packets received, 5.0% packet loss
round-trip min/avg/max/stddev = 74.564/91.412/147.863/18.092 ms
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
Rehearsal ---------------------------------------------------------
TCPSocket 0.180000 0.220000 0.400000 ( 99.048004)
Net::HTTP::Persistent 0.340000 0.120000 0.460000 ( 46.229385)
------------------------------------------------ total: 0.860000sec
user system total real
TCPSocket 0.210000 0.280000 0.490000 (112.646966)
Net::HTTP::Persistent 0.340000 0.140000 0.480000 ( 47.381403)
In this case Net::HTTP::Persistent is about 140% faster than TCPSocket.
If you’re running this benchmark repeatedly make sure you wait until
the sockets fall out of TIME_WAIT before re-running, you should see 0 (or
near 0):
netstat -an | grep TIME | lc
TCPSocket and Net::HTTP::Persistent should show similar times on a fast
link (like loopback). If TCPSocket ends up vastly slower you’ve
probably run out of sockets.
When running this benchmark with high N you may need to increase the
ephemeral port range.
With an N of 50_000 and the following configuration I can run the TCPSocket
or the Net::HTTP requests along with Net::HTTP::Persistent, but not both.
I tried to write a benchmark using curb 0.7.1 but failed to make one that
performed even as well as plain Net::HTTP.
I couldn’t get curb to use a persistent connection.
curl_easy_perform(3) says that libcurl will create a persistent connection
if you call it multiple times with the on the same handle. I can see this
behavior using `strace curl URL URL`.
With curb I see a new socket created per sendto(2)/recvfrom(2) pair. I
also see a bunch of calls to close(2) when ruby performs its final GC pass.
I couldn’t see a way to make curb shut down its socket manually. The
only way to do this is to wait for the GC to collect the socket. Leaving
file descriptors hanging around for the GC is not good. (It also seemed to spend most of the time in the benchmark waiting for sockets to close.)
I started looking through curb to see why it would behave this way, but in
Curb::Easy::new it calls curb_easy_init(3) and doesn’t check the
return value despite the man page saying it may return NULL and gave up.
require 'rubygems'
require 'benchmark'
require 'net/http'
require 'net/http/persistent'
uri_1k = URI.parse 'http://localhost/~drbrain/zeros-1k'
uri_2k = URI.parse 'http://localhost/~drbrain/zeros-2k'
uri_10k = URI.parse 'http://localhost/~drbrain/zeros-10k'
uri = uri_2k
N = 5_000
Benchmark.bmbm do |bm|
bm.report 'TCPSocket' do
# HTTP/1.1 requires handling of chunked transfer-encoding
tcp_request = <<-HTTP
GET #{uri.request_uri} HTTP/1.0\r
Host: #{uri.host}\r
Connection: close\r
\r
HTTP
N.times do
s = TCPSocket.open uri.host, uri.port
s.write tcp_request
data = s.read
s.close # hopefully reduces TIME_WAIT duration
data.split("\r\n\r\n", 2).last # get body
end
end
bm.report 'Net::HTTP' do
N.times do
response = nil
Net::HTTP.start uri.host, uri.port do |http|
# Net::HTTPRequest can't be recycled
request = Net::HTTP::Get.new uri.request_uri
response = http.request request
end
response.body
end
end
bm.report 'Net::HTTP::Persistent' do
http_p = Net::HTTP::Persistent.new
N.times do
response = http_p.request uri
response.body
end
end
end