Frank Thoughts
Scaling: Not Just About Architecture
by Frank Sommers
December 16, 2006

Summary
At SD Forum 2006, two eBay architects presented an overview how eBay's architecture handles a billion page requests a day, and how that architecture evolved from a few Perl scripts to 15,000 application instances running in eight data centers. One conclusion from the presentation is that scaling is only in part a question of architecture.

At SD Forum 2006, Randy Shoup and Dan Pritchett, both with eBay, gave a presentation on eBay's architecture. Pritchett subsequently posted his presentation slides in his blog, The eBay Architecture [PDF].

Predictably, the presentation contained a few awe-inspiring statistics, such as:

212,000,000 registered users
1 billion page views per day
26 billion SQL queries and updates per day
Over 2 petabytes of data
$1,590 worth of goods traded per second
Over 1 billion photos
7 languages
99.94% uptime

Other stats in the presentation related to development process and features, such as:

Over 300 new features released each quarter

Over 100,000 lines of code released every two weeks

That scale notwithstanding, according to the presentation, the goal of eBay's current architecture is to handle an additional ten-fold increase in traffic, something eBay expects to reach within a few short years. Another architecture objective is to be able to handle peak loads, and for components to gracefully degrade under unusual load or in the case of system failures.

According to the presentation, the system architecture is currently moving to Version 4. Predictably, the most interesting technical pieces of the presentation focus on that version, including, for instance, what the presenters said was the first step in scaling the application tier: Throwing out most of J2EE. Instead, they noted that "eBay scales on servlets and a re-written connection pool."

Another interesting aspect of application-layer scaling is that, according to the presentation, no session state is maintained in the application. Instead, "transient state [is] maintained in cookie or scratch database." For data access, eBay uses an internally-developed Java O/R mapping solution.

In scaling up the search aspect of the site, the presenters noted a unique requirement not encountered by general Web search engines, such as Google: eBay users expect changes to their data to show up in search results right away. As well, auction listers know exactly the expected search results—for instance, the items they just listed must show up in all relevant searches. Apparently, just updating the search index took about 9 hours prior to the latest re-architecting of eBay's search.

The presentation is full of similarly challenging problems, as well as insights into their solutions. To me, the most interesting aspect of the presentation, however, is the overview it provides on how eBay's architecture itself evolved. It's worth considering some aspects of Version 1, for instance:

Built over a weekend in 1995 by Pierre Omidyar
Every item was a separate file, generated by a Perl script
No search, browsing only by category
System hardware from commodity parts that could be purchased at Fry's.

This architecture was in place between 1995 and September, 1997. By then, eBay was one of the better-known Web sites, and the architecture maxed out at 50,000 listings, according to the presentation.

The next few iterations involved a move to a 3-tier architecture, at first on Microsoft's IIS server, and then moving to Java. The final few versions indicate a move away from J2EE, and are highly customized to meet eBay's unique demands.

One way to look at the four main architecture versions is as an evolution. Another way to look at it, however, is as coming full circle: starting with a custom-designed solution, moving to a standards-inspired solution, and then moving again into a custom solution.

Based on the overview of the various architecture stages, one cannot help but wonder to what extent eBay's architects were solving urgent present scaling problems, and to what extent they were looking to build scalability into the system to handle future loads. And even if the plan was to design for the future, to what extent could architects truly forecast the scalability of the system at some imagined point in the future?

One problem with such predictions is that even if plenty of data is available on the currently operational system, usage patterns of the system may change—for instance users may start to favor video over simple images, or voice calls as part of interacting with the system. Such usage pattern changes can happen fairly fast, especially given that the average architecture lifespan is around 2-3 years, based on the presentation. Not many people heard of YouTube two or three ago, for example, and in the short lifespan of that company millions of users grew comfortable posting videos online.

Scaling: Organizational Capability + Architecture

That last issue brings me to what I think is the main message of the eBay presentation. The most amazing aspect of this evolution to me is not necessarily the technical brilliance of the solutions at each architecture stage, but the fact that eBay was able to meet the challenges of its growth with subsequent refinements to its system, all the while keeping the site operational.

The reason that's interesting is because it suggests that you can start with almost any architecture—even with Perl or Rails or JSP pages—as long as you know how to migrate to the next step, and have the capability to do so, if and when you need to scale your app. That, in turn, suggests that the key test of scalability is not so much how each architecture stage scales, but how readily a company or an organization can move an application from one architecture step to the next. That indicates that scaling is as much an individual or organizational question as a technical one.

That should not be surprising, of course, since scale always had operational as well as architectural design aspects. (The last segment of the eBay presentation is devoted to the operational aspects of scaling—for instance, illustrating how 15,000 application instances are managed across 8 data centers.) Approaching scaling from that broader perspective suggests, however, that two common aspects of looking at scale may prove unhelpful in practice.

The first aspect is overt emphasis on design for scalability from the start. Most developers know that no architecture scales infinitely, but on occasion architects expand much effort on trying to design one architecture that will scale to some long-term need of an application. Pierre Omidyar likely didn't share that view, which is perhaps why he went down the path of Perl scripts and one-file-per-item in his initial version.

The second less-than-helpful view of scalability views scalability and performance as merely afterthoughts, and discourages scalability considerations at initial stages of an application's development. This view is sometimes expounded by XP proponents, who would much rather code something up quickly than worry about how that code will scale to handle some future application workload.

In practice, neither view may be very helpful. A third, more realistic, view would consider scaling as partly an organizational, even business-level, capability. Recognizing that predicting future workloads is hard, if not impossible, this view would aim at an architecture that handles some near-term scaling goal, and at the same time allows the deployment of features rapidly so that the application's real users can generate a business rational for supporting future architecture upgrades. Far from considering scaling as an afterthought, however, this view would also aim to develop from the start the organizational, and even business, capabilities to handle architectural changes to the system. That seems to be the view presented by the eBay architects at SD Forum.

In your projects, when do you start thinking about scalability?

Talk Back!

Have an opinion? Readers have already posted 11 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Frank Sommers adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

Frank Sommers is a Senior Editor with Artima Developer. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld. Frank also serves as chief editor of the Web zine ClusterComputing.org, the IEEE Technical Committee on Scalable Computing's newsletter. Prior to that, he edited the Newsletter of the IEEE Task Force on Cluster Computing. Frank is also founder and president of Autospaces, a company dedicated to bringing service-oriented computing to the automotive software market.

Prior to Autospaces, Frank was vice president of technology and chief software architect at a Los Angeles system integration firm. In that capacity, he designed and developed that company's two main products: A financial underwriting system, and an insurance claims management expert system. Before assuming that position, he was a research fellow at the Center for Multiethnic and Transnational Studies at the University of Southern California, where he participated in a geographic information systems (GIS) project mapping the ethnic populations of the world and the diverse demography of southern California. Frank's interests include parallel and distributed computing, data management, programming languages, cluster and grid computing, and the theoretic foundations of computation. He is a member of the ACM and IEEE, and the American Musicological Society.


	Web Artima.com