This post originated from an RSS feed registered with Agile Buzz
by Martin Fowler.
Original Post: Bliki: AggregateOrientedDatabase
Feed Title: Martin Fowler's Bliki
Feed URL: http://martinfowler.com/feed.atom
Feed Description: A cross between a blog and wiki of my partly-formed ideas on software development
One of the first topics to spring to mind as we worked on
NosqlDistilled was that NoSQL databases use different
data models than the relational model. Most sources I've looked at
mention at least four groups of data model: key-value, document,
column-family, and graph. Looking at
this list, there's a big similarity between the first three - all
have a fundamental unit of storage which is a rich structure of
closely related data: for key-value stores it's the value, for
document stores it's the document, and for column-family stores it's the
column family. In DDD terms, this group of data is an aggregate.
The rise of NoSQL databases has been driven primarily by the
desire to store data effectively on large clusters - such as the
setups used by Google and Amazon. Relational databases were not
designed with clusters in mind, which is why people have cast around
for an alternative. Storing aggregates as fundamental units makes a
lot of sense for running on a cluster. Aggregates make natural units
for distribution strategies such as sharding, since you have a large
clump of data that you expect to be accessed together.
An aggregate also makes a lot of sense to an application
programmer. If you're capturing a screenful of information and
storing it in a relational database, you have to decompose that
information into rows before storing it away.
An aggregate makes for a much simpler mapping - which is why many
early adopters of NoSQL databases report that it's an easier
programming model.
This synergy between the programming model and the distribution
model is very valuable. It allows the database to use its knowledge
of how the application programmer clusters the data to help
performance across the cluster.
There is a significant downside - the whole approach works really
well when data access is aligned with the aggregates, but what if you
want to look at the data in a different way? Order entry naturally
stores orders as aggregates, but analyzing product sales cuts across
the aggregate structure. The advantage of not using an aggregate
structure in the database is that it allows you to slice and dice
your data different ways for different audiences.
This is why aggregate-oriented stores talk so much about
map-reduce - which is a programming pattern that's well suited to
running on clusters. Map-reduce jobs can reorganize the data
into different groups for different readers - what many people refer
to as materialized views. But it's more work to do this than using the
relational model.
This is part of the argument for PolyglotPersistence -
use aggregate-oriented databases when you are manipulating clear
aggregates (especially if you are running on a cluster) and use
relational databases (or a graph database) when you want to
manipulate that data in different ways.