Following on the heels of Hadoop, and in keeping with its intention to open-source significant parts of its infrastructure, Yahoo released Traffic Server, a high-performance HTTP 1.1 proxy server:
Traffic Server fills the need for a fast, extensible and scalable HTTP 1.1 proxy and cache. We have a production proven piece of software that can deliver HTTP traffic at high rates, and can scale well on modern SMP hardware. We have benchmarked Traffic Server to handle in excess of 35,000 RPS on a single box. Traffic Server has a rich feature set, implementing most of HTTP/1.1 to the RFC specifications.
Traffic Server's source code is hosted as an Apache incubator project.
Traffic Server is a battle-hardened package with more than 200,000 lines of C++ code. Yahoo originally got the software through its acquisition of Inktomi earlier this decade, and it's been using it ever since. Today, the software delivers 30 billion Web objects and 400 terabytes of data each day... And Yahoo can rightly be proud of Traffic Server's performance: that comes from a surprisingly small number of Yahoo servers--between 100 and 150.
One area where Yahoo uses Traffic Server was at Yahoo Sports for handling scores. A regular Web server sends out the Web page to a person's browser, but Traffic Server handles the JavaScript technology that periodically refreshes the contents of a scoreboard element on that page.
It's only a "trickle" of data, but at Yahoo's scale, that can be some pretty heavy work...
Another part of Yahoo operations retrofitted with the software is Yahoo Mail... Traffic Server can be used to process the cookie text files on a person's browser to figure out whether that person can be logged in automatically or the person needs to authenticate anew. It also can route traffic appropriately when, for example, a person who is "homed" to Yahoo's servers in India visits the site while in the United States...
What are your preferred tools to cache, and in general to speed up, the processing of HTTP requests?