This post originated from an RSS feed registered with Agile Buzz
by Martin Fowler.
Original Post: Bliki: PolyglotPersistence
Feed Title: Martin Fowler's Bliki
Feed URL: http://martinfowler.com/feed.atom
Feed Description: A cross between a blog and wiki of my partly-formed ideas on software development
In 2006, my colleague Neal Ford coined the term Polyglot
Programming, to express the idea that applications should be written
in a mix of languages to take advantage of the fact that different
languages are suitable for tackling different problems. Complex applications
combine different types of problems, so picking the right language
for the job may be more productive than trying to fit all
aspects into a single language.
Over the last few years there's been an explosion of interest in
new languages, particularly functional languages, and I'm often
tempted to spend some time delving into Clojure, Scala, Erlang, or
the like. But my time is limited and I'm giving a higher priority to
another, more significant shift, that of the
DatabaseThaw. The first drips have been coming through
from clients and other contacts and the prospects are enticing. I'm
confident to say that if you starting a new strategic enterprise application
you should no longer be assuming that your persistence should be
relational. The relational option might be the right one - but you
should seriously look at other alternatives.
One of the interesting consequences of this is that we are
gearing up for a shift to polyglot persistence [1] - where any decent
sized enterprise will have a variety of different data storage
technologies for different kinds of data. There will still be large
amounts of it managed in relational stores, but increasingly we'll
be first asking how we want to manipulate the data and only then
figuring out what technology is the best bet for it.
This polyglot affect will be apparent even within a single
application[2]. A complex enterprise application uses different
kinds of data, and already usually integrates information from
different sources. Increasingly we'll see such applications manage
their own data using different technologies depending on how the
data is used. This trend will be complementary to the trend of
breaking up application code into separate components integrating
through web services. A component boundary is a good way to wrap a
particular storage technology chosen for the way its data in
manipulated.
This will come at a cost in complexity. Each data storage
mechanism introduces a new interface to be learned. Furthermore data
storage is usually a performance bottleneck, so you have to
understand a lot about how the technology works to get decent speed.
Using the right persistence technology will make this easier, but
the challenge won't go away.
Many of these NoSQL option involve running on large clusters.
This introduces not just a different data model, but a whole range
of new questions about consistency and availability. The
transactional single point of truth will no longer hold sway
(although its role as such has often been illusory).
So polyglot persistence will come at a cost - but it will come
because the benefits are worth it. When relational databases are
used inappropriately, they exert a significant drag on application
development. I was recently talking to a team whose application was
essentially composing and serving web pages. They only looked up
page elements by ID, they had no need for transactions, and no need
to share their database. A problem like this is much better suited
to a key-value store than the corporate relational hammer they had
to use. A good public example of using the right NoSQL choice for
the job is The Guardian - who have felt a definite productivity gain
from using MongoDB over their previous relational option.
Another benefit comes in running over a cluster. Scaling to lots
of traffic gets harder and harder to do with vertical scaling - a
fact we've known for a long time. Many NoSQL databases are designed
to operate over clusters and can tackle larger volumes of traffic
and data than is realistic with single server. As enterprises look to
use data more, this kind of scaling will become increasingly
important. The Danish medication system described at
gotoAarhus2011 was a good example of this.
All of this leads to a big change, but it won't be rapid one -
companies are naturally conservative when it comes to their data
storage.
The more immediate question is which types of projects should
consider an alternative persistence model? My thinking is that
firstly you should only consider projects that are at the strategic
end of the UtilityVsStrategicDichotomy. That's because
utility projects don't have enough benefit to be worth a new
technology.
Given a strategic project, you then have two drivers that raise
alternatives: either reducing development drag or dealing with
intensive data needs. Even here I suspect many projects, probably a
majority, are better off sticking with the relational orthodoxy. But
the minority that shouldn't is a significant one.
One factor that is perhaps less important is whether the project
is new, or already established. The Guardian's shift to MongoDB has
been happening over the last year or so on a code base developed
several years ago. Polyglot persistence is something you can
introduce on an existing code base.
What all of this means is that if you're working in the
enterprise application world, now is the time to start familiarizing
yourself with alternative data storage options. This won't be a fast
revolution, but I do believe the next decade will see the
database thaw progress rapidly.
1:
As far as I can tell, Scott Leberknight was the first person to
start using the term "polyglot persistence".
2:
Don't take the example in the diagram too seriously. I'm not
making any recommendations about which database technology to
use for what kind of service. But I do think that people should
consider these kinds of technologies as part of application
architecture.