I finally bit the bullet and rewrote eigenclass using the Ocsigen web server + framework for OCaml. It is simpler, faster, more reliable, and easier to extend than the customized wiki implementation in Ruby (Hiki) I'd been using. It is also easier to deploy because it's self-contained: a single (native code) executable contains both the Ocsigen web server and the application code, so I don't have to use any special Apache modules, FastCGI or any sort of adapter. (The ability to create standalone, native-code executables was added recently to Ocsigen and is thus available on the devel branch, soon to be released as Ocsigen 1.2.)
I'd read somewhere that the Ocsigen server hadn't received much (any?) optimization work, so I benchmarked it against Lighttpd, Apache and mongrel, both at static file serving and dynamic contents (a minimal "hello world" service), to see if that could represent a problem. It turns out it isn't: the OCaml+Ocsigen combo is very fast. It serves minimal dynamic requests an order of magnitude faster than Rails with a pack of mongrels behind nginx, and uses 40 times less memory. More surprisingly, it handles more requests per core than lighttpd with a minimal FastCGI server written in C! (lighttpd wasn't able to handle ab's load with max_procs = 1, and generated way too many 5xx errors, so I had to use several FastCGI processes). It also serves static files at rates exceeding Apache's (per core).
The following figures were obtained using ApacheBench (ab) locally, on a 3GHz, dual core Athlon64 64 X2.
Dynamic contents
Reqs/sec
Mem usage (resident memory, RSS)
Rails with mongrel, 1 process
260
49MB
Rails with mongrel via nginx (rev proxy), 1 proc
220
~51MB
Rails with mongrel, 4 processes via nginx
430
~200MB
Ocsigen (1 process)
5800
4.5MB
lighttpd with FastCGI app in C, 20 procs
9300
4.5MB
Obviously, these figures represent only upper bounds, since the "dynamic" content was but "hello world", and few sites (certainly not eigenclass.org) need to handle thousands of requests per second. The interesting thing is that, if anything, the difference is going to become even more favorable for Ocsigen+OCaml if the page involves any significant amount of computation, as OCaml is typically 100 times faster than interpreted languages like Ruby. For instance, the OCaml code that processes the markdown-like markup used for this very page is fast enough to sustain over 2000 requests per second without caching the generated HTML. A quick test shows that Ruby's bluecloth library is around 200 times slower, so I would be getting maybe 20 reqs/sec (using both cores) on the AMD64 box (much faster than the one running eigenclass.org) with Rails + Mongrel + nginx. Of course, caching would solve this.; this is not a panacea, though, as it introduces other problems (expiration, invalidation, resource limitation, etc.) and is not always applicable.
At the end of the day, this means that OCaml + Ocsigen allow me to write code that can be deployed trivially (I can even link the executable statically so that it doesn't depend on libs like SQLite or libssl), and is more than fast enough with a single process (no load balancing needed) and no caching (no memcached or whatever).