Ruby Buzz Forum - Killing me softly: Keeping dispatchers alive

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Killing me softly: Keeping dispatchers alive

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Patrick Lenz

Posts: 168
Nickname: scoop
Registered: Apr, 2005

Patrick Lenz is the lead developer at freshmeat.net and a contributor to the typo weblog engine

Killing me softly: Keeping dispatchers alive

Posted: Feb 14, 2006 8:38 AM

This post originated from an RSS feed registered with Ruby Buzz by Patrick Lenz.
Original Post: Killing me softly: Keeping dispatchers alive Feed Title: poocs.net Feed URL: http://feeds.feedburner.com/poocsnet Feed Description: Personal weblog about free and open source software, personal development projects and random geek buzz.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Patrick Lenz Latest Posts From poocs.net

This is an intermediate publication to my long promised in depth review of me trying to scale a million dynamic page impressions a day on Rails.

When the site in question finally stabilized somewhat, a new problem crept up that I've been unable to fully resolve over the past weeks. The net effect is that my FastCGI dispatchers become unresponsive after a while, potentially after a huge traffic spike. Those sit there doing nothing and lighttpd is unable to talk to them.

The site is powered by 4 application servers running 7 dispatchers each and a dedicated lighttpd proxy. After a while, half of those dispatchers are unresponsive and as such no longer serving any requests. The page load times crawl to a halt.

Currently, I'm on Ruby 1.8.4, lighttpd 1.4.10 and Rails 1.0 on Linux 2.6.14.

I've tried everything from upgrading Ruby and all gems to debugging potentially exceeded TCP connection limits on my servers to even talking to weigon, the brains behind lighttpd. No avail.

The weird thing is, it doesn't matter which end I restart, be it the dispatcher *or* lighttpd, everything goes back to normal. That way I cannot even tell for sure that it's Ruby to blame or my application. It could just as well be lighttpd or my local machine configuration.

Since I was in desperate need of an operational site I whipped up a script to probe all the available dispatchers for responsiveness and kill them with brute-force if they aren't. I'm using the process scripts, namely the spinner/spawner duo that comes with Rails. As such, the dispatcher is immediately restarted and becomes available for lighttpd to serve to within a couple of seconds.

As this is obviously more of a band aid than anything else, this script is provided as-is, with no claims being made about being functional for anyone else, being pretty, well documented or not eating your cat. You absolutely need Net::SSH installed in order to be able to kill dispatchers not running on localhost. I'm running the script inside of a screen session in order to keep an eye on what's happening with my dispatchers and how often they get killed. Your mileage may vary.

In case you're having similar issues with your Rails application, feel free to leave a comment. The script only takes care of dispatchers that are already hung. It is by no means meant as a final cure and I'm more than eager to find out what's causing the freezes in the first place.

The script is available in the body of this article or as a download here.

Read: Killing me softly: Keeping dispatchers alive

Previous Topic

Next Topic


	Web Artima.com