This post originated from an RSS feed registered with Ruby Buzz
by Rick DeNatale.
Original Post: When I Say No...
Feed Title: Talk Like A Duck
Feed URL: http://talklikeaduck.denhaven2.com/articles.atom
Feed Description: Musings on Ruby, Rails, and other topics by an experienced object technologist.
Lately, most of the web development community, at least the parts I see regularly seems to be swept up in the NoSQL wave. Some of use seem to be completely eschewing relational databases in favor of newer technology like mongoDB, couchDB, Riak, the project which must not be named, or one of the many such NoSql databases which are au courant.
Of late, I had been working on a project for a client which had me swimming against that trend. My task was to convert a sizable Rails application from mongoDB to SQL.
It is tempting to see a new shiny technology and wield it as if it were Maslow's Hammer.We see it, find it attractive, and attempt to use it as a panacea. We might convince ourselves that because an application has a "Document" model, that a "document" oriented database like couchDB or mongoDB, is more appropriate than SQL. Maybe it is, and maybe it isn't.
A lot of developers seem to look for "best practices," or "best practice patterns," a set of templates for code or processes which will ensure success. The world just ain't that simple. In reality we face a series of choices each with different characteristics and consequences. Rather than a single Maslovian hammer, we need to have a fully stocked toolbox, and consider when we should apply each tool. The job of design, whether it is designing software, houses, or bridges, is to solve a system of conflicting, and reinforcing forces.
The 1980s and early 1990s saw a lot of discussion about the relative merits of relational vs. object-oriented databases. There is a natural tension between traditional databases which strive to separate the data representation from the application code, and object-oriented programming which strongly couples data representation and object methods, and loosely couples objects. This was often called an "impedance mismatch". In the Smalltalk community we saw the genesis of object-oriented databases like Gemstone/S which majored on making the graph of objects persistent, with transactional semantics.
We also saw the advent of object relational mappers. My first encounter with an ORM was TOPLink. This work provided the seeds for today's implementations like ActiveRecord. An ORM acts as a kind of "impedance matching transformer" if you like.
"Big" Dave Thomas used to make the analogy back then that some data was like a corn field, where a combine harvester(SQL) was appropriate, while sometimes you really wanted a Japanese garden, and therefore different tools.
When today's NoSQL movement came up on my radar, my history led me to think of it as the revival of object-oriented databases. The more I look at it, it is becoming clearer that it is not.
Most of the NoSQl movement is trying to solve a different problem than an impedance mismatch between the database and objects. The seminal events which led to today's NoSQL movement seem to be:
The announcement in 2000 of Eric Brewer's conjecture which led to...
A proof of the conjecture in 2002 by Seth Gilbert and Nancy Lynch, after which the conjecture became known as Brewer's CAP theorem.
The 2007 publication of an academic paper on Amazon's Dynamo key value store, which applied a combination of known technologies, to provide a distributed data store which allows for high availability in the face of failing network components.
The key problem is how to create large scale distributed data stores. Brewer's CAP theorem proves that you can not have a combination of Consistency, Availability, and tolerance of network Partitioning, due to component failures. Dynamo trades off consistency for availability in the face of an unreliable network. The use-case presented in the paper is managing Amazon shopping carts.
Relational database systems tend to strive for systems which prioritize consistency over availability and partition tolerance. Features such as two-phase commit allow the implementation of ACID transactions which are atomic (they either succeed totally or fail totally) by ensuring that a transaction either fails when a network failure is detected, or wait for communications to complete before making the results of a transaction globally visible.
More recently it has been pointed out that there is an asymmetry between the three legs of the CAP stool. There seems to be another conjecture here. CAP implies you can build a system which is consistent, and available, but does not tolerate partitioning(CA); a system which is consistent and tolerates partitioning with a sacrifice to availability(CP), or one which provides availability in the face of partitioning by sacrificing consistency(AP). Not tolerating partitioning does not outlaw it though, and the way a CP system reacts to partitioning is to suspend operation until the partitioning is healed, and in practice a CA system does the same thing, it can not tolerate a partition, so it makes the system unavailable to maintain consistency.
Also the realization has come that there is another leg, which is latency. Actually I think that partitioning and latency are related. Network partitioning is really just an drastic increase in latency isn't it? The NoSQL guys talk about "eventual" consistency, which means that the distributed datastore will reach consistency eventually after a partitioning (or after an abnormally long latency message).
I think that the reason for this difference between practice and theory is that whether or not the data store provides consistency makes a big difference to the programming model. An application which is written to deal with the possibility of inconsistency, or eventual consistency, must of necessity be designed differently than one which can rely on the data store to provide consistency .
There is another pitfall for developers used to relational data modeling when they start to use NoSQL approaches, which is what to do about 'relations' between objects. For the most part, and I might be wrong here, NoSQL systems leave you to roll your own here, usually by either putting explicit hash-key links between value blobs in the key-value store, even though those blobs are probably mapped by JSON or the like, or to "de-normalize" the data by copying it, which leads to the additional problem of how to deal with updates. It is not that these are necessarily insurmountable problems, just that solving them requires thinking them through with fresh eyes.
Which brings me to the title of this little essay. To some it appears that some are taking the No in NoSQL to mean just that, "no SQL." In reality, the proponents seem to saying that it stands for "not only SQL." Whether to use relational or NoSQL data technology for all or part of an applications data is not an all or nothing proposition.
It seems to me that the strength of NoSQL is in meeting non-functional requirements, it makes it easier to scale out systems or parts of systems which do not need cross-user ACID characteristics but need to have lots of distributed parallel processing to achieve high throughput and high availability. There are certainly parts of many systems which have those characteristics. Things like shopping carts and other user session data (the use case for Dynamo) do not really need to be consistent globally. A weaker form of consistency, consistency viewed by an individual user is often enough, and is where NoSQL systems tend to focus their efforts, by providing features like "read what you wrote."
The application needs to deal with what happens when consistency is lost or delayed. This is not unlike what we do with optimistic locking in a traditional database, but I would posit that it is not exactly the same thing.
For other parts of the system, like the transactions needed to turn a shopping cart into an order, or credit/debit transactions (e.g for payment or inventory control), an ACID transaction semantic is needed. Building such a semantic on top of a NoSQL base is problematic. Building these parts using a more traditional relational approach seems to make sense.
So the lesson I take from this is that although I want to have NoSQL in my toolbox for use when appropriate, I don't want to see it, or any other tool, as a Maslovian hammer.
Original article writen by Rick DeNatale and published on Talk Like A Duck | direct link to this article | If you are reading this article elsewhere than Talk Like A Duck, it has been illegally reproduced and without proper authorization.