The Artima Developer Community
Sponsored Link

Ruby Code & Style
Linux Clustering with Ruby Queue: Small is Beautiful
by Ara Howard
October 10, 2005

<<  Page 3 of 3

Advertisement

Moving in

I'll walk though the actual sequence of rq commands used to set up an instant Linux cluster comprised of four nodes. The nodes we'll use are onefish, twofish, redfish, bluefish. Each host is identified in its prompt, below. In my home directory on each of the hosts I have the symbolic link "~/nfs" pointing at a common NFS directory.

The first thing we have to do is initialize the queue:

redfish:~/nfs > rq queue create
created <~/nfs/queue>

Next we start feeder daemons on all four hosts:

onefish:~/nfs > rq queue feed --daemon --log=~/rq.log
twofish:~/nfs > rq queue feed --daemon --log=~/rq.log
redfish:~/nfs > rq queue feed --daemon --log=~/rq.log
bluefish:~/nfs > rq queue feed --daemon --log=~/rq.log

In practice you would not want to start feeders by hand on each node, so rq supports being "kept alive" via a crontab entry. When rq runs in daemon mode it acquires a lockfile that effectively limits it to one feeding process per host, per queue. Starting a feeder daemon will simply fail if another one is already feeding on the same queue. Thus a crontab entry like:

15/* * * * * rq queue feed --daemon --log=log

will check to see if a daemon is running every fifteen minutes and start one if, and only if, one is not already running. In this way an ordinary user can setup a process that will be running at all times, even after a machine reboot.

Jobs can be submitted from the command line, input file, or, in Linux tradition, from standard input as part of process pipeline. When submitting using an input file or stdin the format is either YAML (such as that produced as the output of other rq commands) or a simple list of jobs, one job per line. The format is auto-detected. Any host that sees the queue can run commands on it:

onefish:~/nfs > cat joblist
echo 'job 0' && sleep 0
echo 'job 1' && sleep 1
echo 'job 2' && sleep 2
echo 'job 3' && sleep 3

onefish:~/nfs > cat joblist | rq queue submit
-
  jid: 1
  priority: 0
  state: pending
  submitted: 2005-05-12 13:35:31.757662
  started:
  finished:
  elapsed:
  submitter: onefish
  runner:
  pid:
  exit_status:
  tag:
  restartable:
  command: echo 'job 0' && sleep 0
-
  jid: 2
  priority: 0
  state: pending
  submitted: 2005-05-12 13:35:31.757662
  started:
  finished:
  elapsed:
  submitter: onefish
  runner:
  pid:
  exit_status:
  tag:
  restartable:
  command: echo 'job 1' && sleep 1
-
  jid: 3
  priority: 0
  state: pending
  submitted: 2005-05-12 13:35:31.757662
  started:
  finished:
  elapsed:
  submitter: onefish
  runner:
  pid:
  exit_status:
  tag:
  restartable:
  command: echo 'job 2' && sleep 2
-
  jid: 4
  priority: 0
  state: pending
  submitted: 2005-05-12 13:35:31.757662
  started:
  finished:
  elapsed:
  submitter: onefish
  runner:
  pid:
  exit_status:
  tag:
  restartable:
  command: echo 'job 3' && sleep 3

We see, in the output of submitting to the queue, all of information about each of the jobs in YAML format. At this point we check the status of the queue:

redfish:~/nfs > rq queue status
---
jobs:
  pending: 2
  holding: 0
  running: 2
  finished: 0
  dead: 0
  total: 4
temporal:
  pending:
    earliest: { jid: 3, metric: submitted, time: 2005-05-12 13:35:31.757662 }
    latest: { jid: 4, metric: submitted, time: 2005-05-12 13:35:31.757662 }
    shortest:
    longest:
  holding:
    earliest:
    latest:
    shortest:
    longest:
  running:
    earliest: { jid: 1, metric: started, time: 2005-05-12 13:35:37.155667 }
    latest: { jid: 2, metric: started, time: 2005-05-12 13:35:40.111865 }
    shortest:
    longest:
  finished:
    earliest:
    latest:
    shortest:
    longest:
  dead:
    earliest:
    latest:
    shortest:
    longest:
performance:
  avg_time_per_job: 0
  n_jobs_in_last_1_hrs: 0
  n_jobs_in_last_2_hrs: 0
  n_jobs_in_last_4_hrs: 0
  n_jobs_in_last_8_hrs: 0
  n_jobs_in_last_16_hrs: 0
  n_jobs_in_last_32_hrs: 0
exit_status:
  successes: 0
  failures: 0

As you can see, many statistics about the queue are tracked, but right now we see only that two of the jobs have been picked up by a node and are being run while two others are yet to be started. When many jobs have been submitted to a queue and run by a node the status command gives valuable information about the health of the cluster in an instant.

We can find out which nodes are running our jobs using:

onefish:~/nfs > rq queue list running | egrep 'jid|runner'
 jid: 1
 runner: redfish
 jid: 2
 runner: bluefish

The record for a finished jobs remains in the queue until it's deleted since a user would generally want to collect this information. At this point we expect all jobs to be complete so we check their exit status:

bluefish:~/nfs > rq queue list finished | egrep 'jid|command|exit_status'
 jid: 1
 exit_status: 0
 command: echo 'job 0' && sleep 0
 jid: 2
 exit_status: 0
 command: echo 'job 1' && sleep 1
 jid: 3
 exit_status: 0
 command: echo 'job 2' && sleep 2
 jid: 4
 exit_status: 0
 command: echo 'job 3' && sleep 3
All commands have finished successfully. We can now delete any successful job from the queue:

twofish:~/nfs > rq queue query exit_status=0 | rq queue delete
There are many other useful operations rq can perform. For a description, type "rq help."

Looking backward and forward

Making the choice to "roll your own" is always a tough one because it breaks Programmer's Rule Number 42, which clearly states:

Every problem has been solved. It is Open Source. And it is the first link on Google.

Having a tool like Ruby is critical when you decide to break this rule and the fact that a project like Ruby Queue can be written in 3292 lines of code is testament to this fact. With few major enhancements planned, it is likely this small number will not grow much as the code base is refined and improved. The goals of rq remain simplicity and ease of use.

Ruby Queue [12] set out to lower the barrier scientists had to overcome in order to realize the power of Linux clusters. Providing a simple and easy to understand tool which harnesses the power of many CPUS allows them to shift their focus away from the mundane details of complicated distributed computing systems and back to the task of actually doing science. Sometimes small is beautiful.

Note

At the time this article was first written rq was a new piece of software that was promising but had barely been tested. After nearly 9 months of 24/7 use it has proved to be a viable solution : our group has now run millions of jobs using the software with a zero bugs filed and zero admin time dedicated.

Talk back!

Have an opinion about Ruby Queue? Discuss this article in the Articles Forum topic, Linux Clustering with Ruby Queue: Small is Beautiful.

Resources

[0] openMosix is a Linux kernel extension for single-system image clustering which turns a network of ordinary computers into a supercomputer.
http://openmosix.sourceforge.net/

[1] The Grid Engine project is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems.
http://gridengine.sunsource.net/

[2] The Linux Network File System is the backbone of many laboratories and Linux clusters.
http://nfs.sourceforge.net/

[3] The tommy gun of text editors.
http://www.vim.org/

[4] The main Ruby language site.
http://www.ruby-lang.org/

[5] The National Geophysical Data Center.
http://www.ngdc.noaa.gov/

[6] The Solar-Terrestrial Physics group pays Ara to write Ruby and do other fun things.
http://www.ngdc.noaa.gov/stp/

[7] The Defense Meteorological Satellite Program is responsible for the slew of data and associated processing that necessitated the development of Ruby Queue.
http://dmsp.ngdc.noaa.gov/

[8] SQLite is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine.
http://www.sqlite.org/

[9] Ruby bindings for the SQLite library.
http://rubyforge.org/projects/sqlite-ruby/

[10] YAML (rhymes with "camel") is a straightforward machine parsable data serialization format designed for human readability and interaction with scripting languages such as Perl, Python and Ruby.
http://www.yaml.org/

[11] For Ruby developers, YAML is a natural fit for object serialization and general data storage. Really, it's quite fantastic. Spreads right on your Rubyware like butter on bread!
http://yaml4r.sourceforge.net/

[12] Get the latest release of Ruby Queue here today and rev up the CPU cycles you're throwing at your projects!
http://raa.ruby-lang.org/project/rq/

About the author

Ara Howard is Research Associate for the The Cooperative Institute for Research in Environmental Sciences (CIRES). He spends his time programming Ruby, or mountain biking and skiing with his wife Jennifer and trio of border collies : Eli, Joey, and Zipper.

This article was first published in Linux Journal, in December 2004.

<<  Page 3 of 3


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us