This post originated from an RSS feed registered with Agile Buzz
by Martin Fowler.
Original Post: Bliki: OrmHate
Feed Title: Martin Fowler's Bliki
Feed URL: http://martinfowler.com/feed.atom
Feed Description: A cross between a blog and wiki of my partly-formed ideas on software development
While I was at the QCon conference in London a couple of months
ago, it seemed that every talk included some snarky remarks about
Object/Relational mapping (ORM) tools. I guess I should read the
conference emails sent to speakers more carefully, doubtless there
was something in there telling us all to heap scorn upon ORMs at
least once every 45 minutes. But as you can tell, I want to push
back a bit against this ORM hate - because I think a lot of it is unwarranted.
The charges against them can be summarized in that they are
complex, and provide only a leaky abstraction over a relational data
store. Their complexity implies a grueling learning curve and often
systems using an ORM perform badly - often due to naive interactions
with the underlying database.
There is a lot of truth to these charges, but such charges miss a
vital piece of context. The object/relational mapping problem is
hard. Essentially what you are doing is synchronizing between
two quite different representations of data, one in the relational
database, and the other in-memory. Although this is usually referred
to as object-relational mapping, there is really nothing to do with
objects here. By rights it should be referred to as
in-memory/relational mapping problem, because it's true of mapping
RDBMSs to any in-memory data structure. In-memory data structures
offer much more flexibility than relational models, so to program
effectively most people want to use the more varied in-memory
structures and thus are faced with mapping that back to relations
for the database.
The mapping is further complicated because you can make changes
on either side that have to be mapped to the other. More
complication arrives since you
can have multiple people accessing and modifying the database
simultaneously. The ORM has to handle this concurrency because you can't just rely
on transactions- in most cases, you can't hold transactions
open while you fiddle with the data in-memory.
I think that if you if you're going to dump on something in the
way many people do about ORMs, you have to state the alternative.
What do you do instead of an ORM? The cheap shots I usually hear
ignore this, because this is where it gets messy. Basically it
boils down to two strategies, solve the problem differently (and
better), or avoid the problem. Both of these have significant
flaws.
A better solution
Listening to some critics, you'd think that the best thing for a
modern software developer to do is roll their own ORM. The
implication is that tools like
Hibernate and Active Record have just become bloatware, so you
should come up
with your own lightweight alternative. Now I've spent many an hour
griping at bloatware, but ORMs really don't fit the bill - and I say
this with bitter memory. For much of the 90's I saw project after
project deal with the object/relational mapping problem by writing
their own framework - it was always much tougher than people
imagined. Usually you'd get enough early success to commit deeply to
the framework and only after a while did you realize you were in a
quagmire - this is where I sympathize greatly with Ted Neward's
famous quote that object-relational mapping is the Vietnam
of Computer Science[1].
The widely available open source ORMs (such as iBatis, Hibernate,
and Active Record) did a great deal to remove this problem [2].
Certainly they are not trivial tools to use, as I said the
underlying problem is hard, but you don't have to deal with the full
experience of writing that stuff (the horror, the horror). However much
you may hate using an ORM, take my word for it - you're better off.
I've often felt that much of the frustration with ORMs is about
inflated expectations. Many people treat the relational database "like a
crazy aunt who's shut up in an attic and whom nobody wants to talk
about"[3]. In this world-view they just want to deal
with in-memory data-structures and let the ORM deal with the
database. This way of thinking can work for small applications and
loads, but it soon falls apart once the going gets tough.
Essentially the ORM can handle about 80-90% of the mapping problems,
but that last chunk always needs careful work by somebody who really
understands how a relational database works.
This is where the criticism comes that ORM is a leaky
abstraction. This is true, but isn't necessarily a reason to avoid
them. Mapping to a relational database involves lots of repetitive,
boiler-plate code. A framework that allows me to avoid 80% of that
is worthwhile even if it is only 80%. The problem is in me for
pretending it's 100% when it isn't. David Heinemeier Hansson, of
Active Record fame, has always argued that if you are writing an
application backed by a relational database you should damn well
know how a relational database works. Active Record is designed with
that in mind, it takes care of boring stuff, but provides manholes
so you can get down with the SQL when you have to. That's a far
better approach to thinking about the role an ORM should play.
There's a consequence to this more limited expectation of what
an ORM should do. I often hear people complain that they are
forced to compromise their object model to make it more relational
in order to please the ORM. Actually I think this is an inevitable consequence
of using a relational database - you either have to make your
in-memory model more relational, or you complicate your mapping
code. I think it's perfectly reasonable to have a
more relational domain model in order to simplify your
object-relational mapping. That doesn't mean you should always
follow the relational model exactly, but it does mean that you
take into account the mapping complexity as part of your domain
model design.
So am I saying that you should always use an existing ORM rather
than doing something yourself? Well I've learned to always avoid
saying "always". One exception that comes to mind is when you're
only reading from the database. ORMs are complex because they
have to handle a bi-directional mapping. A uni-directional problem is
much easier to work with, particularly if your needs aren't too
complex and you are comfortable with SQL. This is one of the
arguments for CQRS.
So most of the time the mapping is a complicated problem, and
you're better off using an admittedly complicated tool than starting a
land war in Asia. But then there is the second alternative I
mentioned earlier - can you avoid the problem?
Avoiding the problem
To avoid the mapping problem you have two alternatives. Either you use
the relational model in memory, or you don't use it in the database.
To use a relational model in memory basically means programming
in terms of relations, right the way through your application. In
many ways this is what the 90's CRUD tools gave you. They work very
well for applications where you're just pushing data to the screen
and back, or for applications where your logic is well expressed in
terms of SQL queries. Some problems are well suited for this
approach, so if you can do this, you should. But
its flaw is that often you can't.
When it comes to not using relational databases on the disk, there
rises a whole bunch of new champions and old memories. In the 90's
many of us (yes including me) thought that object databases would
solve the problem by eliminating relations on the disk. We all know
how that worked out. But there is now the new crew of NoSQL
databases - will these allow us to finesse the ORM quagmire and give
allow us to shock-and-awe our data storage?
As you might have gathered, I think NoSQL is technology to be
taken very seriously. If you have an application problem that maps
well to a NoSQL data model - such as aggregates or graphs - then you
can avoid the nastiness of mapping completely. Indeed this is often
a reason I've heard teams go with a NoSQL solution. This is, I
think, a viable route to go - hence my interest in increasing our
understanding of NoSQL systems. But even so it only works when the
fit between the application model and the NoSQL data model is good.
Not all problems are technically suitable for a NoSQL database. And
of course there are many situations where you're stuck with a
relational model anyway. Maybe it's a corporate standard that you
can't jump over, maybe you can't persuade your colleagues to accept
the risks of an immature technology. In this case you can't avoid
the mapping problem.
So ORMs help us deal with a very real problem for most enterprise
applications. It's true they are often misused, and sometimes the
underlying problem
can be avoided. They aren't pretty tools, but then the problem they
tackle isn't exactly cuddly either. I think they deserve a little
more respect and a lot more understanding.
1:
I have to confess a deep sense of conflict with the Vietnam
analogy. At one level it seems like a case of the pathetic
overblowing of software development's problems to compare a
tricky technology to war. Nasty the programming may be, but
you're still in a relatively comfy chair, usually with air
conditioning, and bug-hunting doesn't involve bullets coming at
you. But on another level, the phrase certainly resonates with
the feeling of being sucked into a quagmire.
2:
There were also commercial ORMs, such as TOPLink and Kodo. But the
approachability of open source tools meant they became dominant.
3:
I like this phrase so much I feel
compelled to subject it to re-use.