This post originated from an RSS feed registered with Java Buzz
by Brian McCallister.
Original Post: Really Big Apps (Initial Braindump)
Feed Title: Waste of Time
Feed URL: http://kasparov.skife.org/blog/index.rss
Feed Description: A simple waste of time and weblog experiment
Martin and I had a fantastic discussion earlier today a week or so ago (when I started writing this, holy crap having a two month old on hand stops the blogging) about really big applications. It was fantastic partly because we hashed out some good ideas (which I want to start sharing) but also because it reinforced something Pat Helland has been saying which I strongly agree with: that we lack good language for talking about really big apps.
There has been a bit of a blogging theme going on looking at this. The aforementioned Pat talks about it, Werner Vogels certainly talks a lot about it, the High Scalability Blog has been posting lots of profiles, there was lots of talk around the Seattle Conference on Scalability, and most recently there has been much patter about Amazon's Dynamo.
Once past the operational issues (extremely non-trivial) the scaling
bottleneck for big apps generally runs into handling of data. At JavaOne this
past summer, Diego specifically
narrowed it to handling update rates and I agree in principle, but want to
expand it into enough detail to be meaningful. Of the folks with really large,
industrial datasets, not many folks realy talk about how they handle it.
Google has done the most talking about how they handle large contiguous
datasets -- first with GFS, MapReduce, Sawzall, then BigTable and Chubby. This has lead
lots of folks to thinking of the problems in the same terms -- not a bad
thing, but not the only way. Hadoop is basically reverse engineering this, and
I know of a couple other private implementations modeling the same thing. It
is kind of funny, actually. I am super-jazzed to have Amazon talking about
different angles on big-data stuff as well :-)
Going to go post this before another week sneaks by. A lot more thoughts
but not enough time to brain dump them. Hopefully get a chance soon.