> Good points. I also think that OS virtualization will work > well with replicated databases. Does Artima really wants > to be in the business of managing physical servers? I > suspect not. Imagine a scenario where a certain service > level can be purchased from a hosting company - lets say > we want a constant response time - and virtual servers > will be started automatically in appropriate geographical > locations based on real-time usage analysis. I know folks > are working on such technology, the question is when it > will be cost effective for a small company? >
It was fascinating to read this ancient discussion - from almost a decade ago! - which I came across on a random search. It's funny that the fundamental issue discussed here - how to plan your database to allow for scalability - is still pertinent for many if not most application developers, despite the advent of new paradigms like cloud computing and NoSQL which were supposed to provide easy solutions. In reality, as most practitioners are aware, database partitioning and scalability are still far from trivial. However, I wanted to refresh this discussion with a perspective on current technologies and how they help solve this problem in new ways that did not exist back in 2006 - in line with @David's visionary comment about virtual/hosting services - cloud services as they are known today - that he hoped would make scalability easier.
Today many enterprise applications run on MySQL, which has a robust clustering / partitioning mechanism. This is quite similar to the Postgres-based Artima database that was originally discussed. But as @David mentions, this setup comes at the cost of ongoing maintenance, both of the physical hardware that runs the databases and of the actual clustering setup. Also, these clustering setups are not "elastic" (a term that did not exist in 2006 but is bon-ton today), in the sense that they support a certain level of scalability, and beyond that level, the cluster needs to be re-tooled. Artima's "site" based clustering is relatively elastic, but it's important to note that it required re-rewriting the entire application, hence the dilemma of whether to "plan for scale" to begin with. Today we are getting closer to solutions that will allow partitioning and scaling without facing this dilemma - taking an existing application and scaling it transparently.
The "NoSQL" concept gained popularity, around 2008 and onwards, mainly due to the massive scalability it enabled at major technology brands like Google, Facebook and Twitter. Today there are many mature NoSQL databases that are in widespread use, but in fact few of them solve our fundamental problem - allowing applications to scale their data transparently. To take a few examples, Cassandra has a non-transparent ring-based clustering model, Neo4j requires complex sharding to scale, and Redis does not have a working clustering feature, and so is effectively limited to the memory of a single node. An exception is MongoDB, which was built on the premise of elastic scalability, and to a certain extent delivers on this promise. But even so - it requires manually provisioning the hardware and handling failures, whether in your local data center or in the cloud, which, counter-intuitively, makes it even more difficult to handle clustered scenarios.
Going back to @David's wish that we would be able to purchase "a certain service level ... from a hosting company", this is only now becoming a reality with hosted database services, which build on some of the new and more scalable database solutions, but with the additional aspect of fully-managed clustering and high availability on the cloud. Here are a few services I have heard of, in descending order of query capabilities (from full SQL to simple key-value storage):
* http://www.cleardb.com - MySQL database as a service running on Amazon or Windows Azure, with built-in high availability, uses master-master replication to scale.
* https://mongolab.com - MongoDB's document-based NoSQL database as a managed service, claims unlimited sclability with multi-cloud support.
* http://cloudant.com - another document-based database provided as a service, with a "regular" shared server plan that provides transparent scaling, and a "heavy" dedicated cluster option.
* http://redis-cloud.com - managed hosting for the in-memory Redis database (with query functionality somewhere in between document store DBs and key-value stores) with automatic clustering and fault tolerance.
* http://aws.amazon.com/dynamodb - Amazon's high performance (running on SSD) key-value database, with very basic functionality but correspondingly high simplicity, offering transparent scalability.
These services are in their early days but they are certainly within the reach of small companies in terms of cost, and at least claim to provide transparent scalability for existing applications with no code changes. Supposedly, this solves the dilemma of planning for scale - you can just write your application and hook up to these or similar services to gain unlimited scaling, and correspondingly, fixed service level regardless of throughput or load. Of course in reality things are always more complex. I invite the community to share insights and practical experience with these types of databse services - do they really solve the fundamental problem? Is this category of products a true step forward or just another "plaster" on the proverbial scalability bruise?
Flat View: This topic has 15 replies
on 2 pages