Re: Seven Lessons on Scalability from Reddit
Posted: May 30, 2010 11:02 AM
Hi John -
> What most programmers lack is insight, or data mining, on
> how they design systems. That is why I pressed James to
> tell me WHY reddit switched to CassandraDB. What
> STATISTICAL MEASURES did they make to evaluate the
> effectiveness of the move? HOW EFFECTIVE was it? WHAT
> configuration options and partitioning strategies did they
> try while using RDBMSes? WHAT was their time/cost
> expenditure while using a RDBMS? WHICH RDBMS? WHAT are the
> read/write characteristics? WHAT was the mean-time-
> to-failure? You know, a REAL engineering discussion. Not a
> superfluous list.
I've been thinking about your comments for a few days, and there is something basic in the underlying premise that I do disagree with. What you are describing are good engineering practices, and given enough time (e.g. a few decades), they could be used in an iterative manner to determine what architectural change should be made in order to address a scalability issue that is -- due to a lack of foresight, initial ignorance, or more likely initial haste -- architected into an application. So while you are absolutely correct that yours are tried and true methods for achieving local and perhaps even global maxima, the company (in this case Reddit) would have been long gone before the proper route would have been selected.
Architectural challenges (such as the one being discussed) are capable of being examined in one's mind, and the potential solutions evaluated not against the particular issue du jour, but against systemic factors. The reason why sites such as Reddit tend to switch from a SQL RDBMS to a "noSQL" databases / in-memory caching system is that the SQL RDBMS guarantees too much, and those guarantees limit scalability (by some combination of concurrency control, ordering and consistency). Since those guarantees are either not required by the application or otherwise achievable by some (unfortunate) increase in application logic, and since trading them off allows the data access layer and/or durable store to scale close to linearly, large-scale applications without highly transactional workflows will naturally tend toward the so-called "noSQL" approaches.
That said, I do agree that many of your questions were valid from the point of view of learning from the decision:
* What STATISTICAL MEASURES did they make to evaluate the effectiveness of the move?
* HOW EFFECTIVE was it?
I would add:
* How expensive was it to switch in terms of infrastructure changes? Application changes?
* Was additional complexity introduced to the application logic in order to compensate for capabilities that were no longer available (i.e. things the RDBMS did)?
* Was the RDBMS already sharded? If not, would it have been possible to shard the RDBMS instead of moving to a non-RDBMS? Was it considered? What were the perceived pros and cons?
> All the issues you appear to struggle with really come
> down to the lack of a deterministic, reliable build
> You can mathematically do semi-automatic script
> generation using program invariants.
What I disagree with is the notion that issues-in-the-large can be efficiently solved using tried-and-true engineering approaches that are ideal for issues-in-the-small. Knowing the difference (i.e. knowing when something is truly an architectural issue) is often the hardest part.
Cameron Purdy | Oracle Coherence