Artima Developer Spotlight Forum - Seven Lessons on Scalability from Reddit

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Artima Developer Spotlight Forum
Seven Lessons on Scalability from Reddit

51 replies on 4 pages. Most recent reply: Jun 29, 2010 5:15 PM by John Zabroski

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 51 replies on 4 pages [ « | 1 2 3 4 ]

Slava Imeshev

Posts: 114
Nickname: imeshev
Registered: Sep, 2004

Re: Seven Lessons on Scalability from Reddit

Posted: May 25, 2010 11:40 AM

Reply

Advertisement

Hi Bill,

> Sorry you didn't like the presentation, but please keep in
> mind that the discussion here is a big part of the
> educational value we try to provide.

I agree completely. Even though some approaches in the presentation are not necessarily applicable to everyone or even seem just wrong, this kind of discussion helps others who not as experienced, to see the pros and cons. Keep on rolling :-)

Regards,

Slava Imeshev
http://www.cacheonix.com

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Seven Lessons on Scalability from Reddit

Posted: May 27, 2010 5:40 AM

Reply

>> Why does Artima insist on posting really dumb stuff lately?

> Why is this dumb exactly? I don't necessarily disagree but I'm curious.

Because they used Memcached instead of Coherence ;-)

The first 5 items on the list looked pretty obvious. I'm curious to hear the reasons cited for 6 & 7 (I'm not doubting that they're good reasons, but I'd like to hear them).

Peace,

Cameron Purdy | Oracle Coherence
http://coherence.oracle.com/

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Seven Lessons on Scalability from Reddit

Posted: May 30, 2010 8:02 AM

Reply

Hi John -

> What most programmers lack is insight, or data mining, on
> how they design systems. That is why I pressed James to
> tell me WHY reddit switched to CassandraDB. What
> STATISTICAL MEASURES did they make to evaluate the
> effectiveness of the move? HOW EFFECTIVE was it? WHAT
> configuration options and partitioning strategies did they
> try while using RDBMSes? WHAT was their time/cost
> expenditure while using a RDBMS? WHICH RDBMS? WHAT are the
> read/write characteristics? WHAT was the mean-time-
> to-failure? You know, a REAL engineering discussion. Not a
> superfluous list.

I've been thinking about your comments for a few days, and there is something basic in the underlying premise that I do disagree with. What you are describing are good engineering practices, and given enough time (e.g. a few decades), they could be used in an iterative manner to determine what architectural change should be made in order to address a scalability issue that is -- due to a lack of foresight, initial ignorance, or more likely initial haste -- architected into an application. So while you are absolutely correct that yours are tried and true methods for achieving local and perhaps even global maxima, the company (in this case Reddit) would have been long gone before the proper route would have been selected.

Architectural challenges (such as the one being discussed) are capable of being examined in one's mind, and the potential solutions evaluated not against the particular issue du jour, but against systemic factors. The reason why sites such as Reddit tend to switch from a SQL RDBMS to a "noSQL" databases / in-memory caching system is that the SQL RDBMS guarantees too much, and those guarantees limit scalability (by some combination of concurrency control, ordering and consistency). Since those guarantees are either not required by the application or otherwise achievable by some (unfortunate) increase in application logic, and since trading them off allows the data access layer and/or durable store to scale close to linearly, large-scale applications without highly transactional workflows will naturally tend toward the so-called "noSQL" approaches.

That said, I do agree that many of your questions were valid from the point of view of learning from the decision:

* What STATISTICAL MEASURES did they make to evaluate the effectiveness of the move?
* HOW EFFECTIVE was it?

I would add:

* How expensive was it to switch in terms of infrastructure changes? Application changes?
* Was additional complexity introduced to the application logic in order to compensate for capabilities that were no longer available (i.e. things the RDBMS did)?
* Was the RDBMS already sharded? If not, would it have been possible to shard the RDBMS instead of moving to a non-RDBMS? Was it considered? What were the perceived pros and cons?

> All the issues you appear to struggle with really come
> down to the lack of a deterministic, reliable build
> process.

.. and:

> You can mathematically do semi-automatic script
> generation using program invariants.

What I disagree with is the notion that issues-in-the-large can be efficiently solved using tried-and-true engineering approaches that are ideal for issues-in-the-small. Knowing the difference (i.e. knowing when something is truly an architectural issue) is often the hardest part.

Peace,

Cameron Purdy | Oracle Coherence
http://coherence.oracle.com/

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Seven Lessons on Scalability from Reddit

Posted: Jun 17, 2010 9:02 AM

Reply

Cameron,

I actually covered some of the things you "added" on your list.

Have you noticed that Google Apps had to freeze billing for its data storage engine due to scaling problems? That Postini is now also having scaling problems?

Tell me what your solution is, if it is not real engineering.

Richard Henderson

Posts: 1
Nickname: rickaitch
Registered: Jun, 2010

Re: Seven Lessons on Scalability from Reddit

Posted: Jun 22, 2010 8:21 AM

Reply

@"invariants", how do you know what is invariant ahead of time?

20:20 hindsight is all very well.

Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Seven Lessons on Scalability from Reddit

Posted: Jun 22, 2010 6:36 PM

Reply

Hi John -

> I actually covered some of the things you "added" on your
> list.

You may have.

> Have you noticed that Google Apps had to freeze billing
> for its data storage engine due to scaling problems? That
> Postini is now also having scaling problems?

It doesn't surprise me that companies would have scaling issues. Even companies with a few smart people like Google.

> Tell me what your solution is, if it is not real
> engineering.

Thinking. Being able to explain a solution before one starts coding it. Being able to defend it in the company of people who have both succeeded and failed with similar requirements.

As Brooks said, it's "only slightly removed from pure thought stuff":
http://www.jroller.com/cpurdy/entry/the_joy_of_programming

I am not quite smart enough to explain the process of solving these problems, but in my job I do get to witness it almost every day.

Peace,

Cameron Purdy | Oracle Coherence
http://coherence.oracle.com/

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Seven Lessons on Scalability from Reddit

Posted: Jun 29, 2010 5:15 PM

Reply

Usually, the process is pretty ugly, but eventually a concensus emerges.

For what it is worth, I appreciated your feedback, even if I sounded a bit hasty. I am just extremely closed-minded and visionary and like challenging people rather than merely accepting their answers as-is, even if it means being a little caustic.

Flat View: This topic has 51 replies on 4 pages [ « | 1 2 3 4 ]

Previous Topic

Next Topic

Sponsored Links

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use