The Artima Developer Community
Sponsored Link

Ruby Code & Style
Linux Clustering with Ruby Queue: Small is Beautiful
by Ara Howard
October 10, 2005

<<  Page 2 of 3  >>

Advertisement

Putting up walls

Using posixlock and SQLite made coding a persistent NFS-safe priority queue class relatively straightforward. Of course there were performance issues to address and a lease-based locking system was added to detect the possible lockd starvation issues I'd heard rumors about on the SQLite mailing list. I posted many questions to the NFS mailing lists during this development stage, and developers such as Trond Myklebust were invaluable resources to me.

I'm not too smart, especially when it comes to guessing the state of programs I myself wrote. Wise programmers know that there is no substitute for good logging. Ruby ships with a built-in Logger class that offers features like automatic log rolling. Using this class as a foundation, I was able to abstract a small module that's used by all the classes in rq to give consistent, configurable, and pervasive logging to all its objects in only a few lines of code. Being able to leverage built-in libraries to abstract important building blocks like logging is a time- and mind-saver.

If you are still using XML as a data serialization format and yearn for something easier and more readable I urge you to check out YAML [10]. Ruby Queue uses YAML extensively both as in input and output format. For instance, the rq command line tool shows jobs marked "important" as:

-
  jid: 1
  priority: 0
  state: pending
  submitted: 2004-11-12 15:06:49.514387
  started:
  finished:
  elapsed:
  submitter: redfish
  runner:
  pid:
  exit_status:
  tag: important
  restartable:
  command: my_job.sh
-
  jid: 2
  priority: 42
  state: finished
  submitted: 2004-11-12 17:37:10.312094
  started: 2004-11-12 17:37:13.132700
  finished: 2004-11-12 17:37:13.739824
  elapsed: 0.015724
  submitter: redfish
  runner: bluefish
  pid: 5477
  exit_status: 0
  tag: important
  restartable:
  command: my_high_priority_job.sh

This format is easy for humans to read and friendly to Linux commands like egrep(1). But best of all, the document above, when used as the input to a command, can be loaded into Ruby as an array of hashes with a single command:

require 'yaml'
jobs = YAML::load STDIN

It can then be used as a native Ruby object with no complex API required:

jobs.each do |job|
  priority = job['priority']
  ...
end

Perhaps the best summary of YAML for Ruby [11] is by it's author, "_why":

"Really, it's quite fantastic. Spreads right on your Rubyware like butter on bread!"

The roof

I actually had a prototype in production (which we do a lot in the DMSP group) when a subtle bug cropped up.

There is a feature of NFS known as "silly renaming." This happens when two clients have an NFS file open and one of them removes it, causing the the NFS server to rename the file as something like ".nfs123456789" until the second client is done with it and the file can truly be removed.

The general mode of operation for rq, when feeding on a queue (running jobs from it) is to start a transaction on the SQLite database, find a job to run, fork a child process to run the job, update the database with information such as the pid of the job, and to end the transaction. As it turns out, transactions in SQLite involve some temporary files which are removed at the end of the transaction. The problem was that I was forking in the middle of a transaction causing the file handle of the temporary file to be open in both the child and the parent. When the parent then removed the temporary file at the end of the transaction, a "silly rename" occurred so that the child's file handle was still valid. I started seeing dozens of these "silly" files cluttering my queue directories; they eventually would disappear but they were ugly and unnerving to users.

I initially looked into closing the file handle somehow after forking but received some bad news from Dr. Richard Hipp, the creator of SQLite, on the mailing list. He said forking in the middle of a transaction results in "undefined" behavior and was not recommended.

This was bad news, as my design depended heavily on forking in a transaction in order to preserve the atomicity of starting a job and updating it's state. What I needed to be able to do was "fork without forking." More specifically, I needed another process to fork, run the job, and wait for it on my behalf. Now, the idea of setting up a co-process and using IPC to achieve this made me break out in hives. Fortunately Ruby had a hiveless solution.

DRb, or Distributed Ruby, is a built-in library for working with remote objects. It's like Java RMI or SOAP, only about a million times easier to get going. What do remote objects have to do with forking in another process? I coded a tiny class that does the forking, job running, and waiting for me, and an instance of this class can then setup a local DRb server in a child process. Communication is done transparently via Unix domain sockets. In other words the DRb server is the co-process that does all the forking and waiting for me. Interacting with this object is like interacting any other Ruby object. The entire JobRunnerDaemon class is 101 lines of code, including setting up the child process. Following are some excerpts from the Feeder class which shows the key points of its usage.

An instance of a JobRunnerDaemon is started in child process and a handle on that remote (but on localhost) object returned:

jrd = JobRunnerDaemon::daemon

A JobRunner object is created for a job, the JobRunner is created by pre-forking a child in the JobRunnerDaemon's process used later to run the Job. Note that the actual fork takes place in the child process so it does not affect parent's transaction:

runner = jrd.runner job
pid = runner.pid
runner.run

Later the DRb handle on the JobRunnerDaemon can be used to wait on the child. This blocks just as a normal wait would, even though we are waiting on the child of a totally different process!

cid, status = jrd.waitpid2 -1, Process::WUNTRACED

We go through "Run it. Break it. Fix it." cycles like this a lot in my group, the philosophy being that there is no test like production. The scientists I work most closely with, Kim Baugh and Jeff Safran, are more than happy to have programs explode in their faces if the end result is better, more reliable code. Programs written in a dynamic language like Ruby enable me to fix bugs fast, which keeps their enthusiasm for testing high. The combined effect is a rapid evolutionary development cycle.

<<  Page 2 of 3  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us