This post originated from an RSS feed registered with .NET Buzz
by Udi Dahan.
Original Post: Dataset ��� O/R mapping rumble at TechEd MVP Dinner
Feed Title: Udi Dahan - The Software Simplist
Feed URL: http://feeds.feedburner.com/UdiDahan-TheSoftwareSimplist
Feed Description: I am a software simplist. I make this beast of architecting, analysing, designing, developing, testing, managing, deploying software systems simple.
This blog is about how I do it.
So, last night I was at the MVP dinner in TechEd and everything was nice. We had a nice meal, conversation was nice, weather was��� nice. And then the volume started to rise, slowly at first, so as you don���t quite notice it. After a bit, you kind of stop talking and look around. And then I hear it���
<WWF announcer voice>
Are you ready��� to RUMBLE !?!?
</WWF announcer voice>
It was datasets vs. O/R mapping, a slight twist on the infamous datasets vs. custom objects debate, all over again. They pulled me in, kicking and screaming, I swear, I really do. The lines were drawn, maintainability, performance, all the things that architects like to philosophize about in terms of other people���s work.
Anyway, I won���t give you the play-by-play ���cause we were there almost all night. I���ll just cut to the chase.
First things first ��� any comparison of solutions without the context of a problem leads nowhere, fast, and stays there. So the first question I asked (when I got the chance to speak) was ���are we talking about querying/reporting here?��� and the answer was something like ���well, yeah, but a lot of other things too���. So my suggestion was that we discuss the solutions in terms of two contexts - querying/reporting and OLTP.
What I mean by OLTP is the data-updating kind of work that you do on certain items. Examples of this include ���insert order���, ���change customer address���, and ���discount product���. Querying/reporting doesn���t change data, and often involves dealing with large sets of data pulled from different kinds of entities (in ERD terms).
Luckily, my suggestion to deal with them separately was accepted. Secondly, I proposed that an object model (specifically implementing the Domain Model pattern) designed for OLTP would perform poorly when used for querying/reporting ��� simply because it wasn���t designed for it. The structure of a domain model is such that it makes it possible to define / implement business rules in one place. That���s possible, not easy.
Well, the dataset people weren���t going to just hand me the OLTP side of the equation without a fight, so they mentioned how easy it was to just ���AcceptChanges���, and that my way was much more complex. My rebuttal came in the form of a question (are you seeing a pattern here?): Do you just swallow DbConcurrencyExceptions are do you throw all the user���s changes away when it happens? I didn���t quite make out the answer since there was a lot of mumbling going on, but I���m pretty sure they had one. I mean, you can���t develop multi-user systems using datasets without running into this situation.
The example that clinched OLTP was this. Two users perform a change to the same entity at the same time ��� one updates the customer���s marital status, the other changes their address. At the business level, there is no concurrency problem here. Both changes should go through. When using datasets, and those changes are bundled up with a bunch of other changes, and the whole snapshot is sent together from each user, you get a DbConcurrencyException. Like I said, I���m sure there���s a solution to it, I just haven���t heard it yet.
Now, here���s where things get interesting. I didn���t say that using a domain model automatically solves this problem. Rather, I described how each client could send a specific message, one a ChangeMaritalStatusMessage, the other a ChangeAddressMessage, to the server ��� in essence, giving the server the context in which each bit of data is relevant. The server could just open a transaction, get the customer object based on its Id, call a method on the customer (ChangeMaritalStatus or ChangeAddress), and commit the transaction. If two of these messages got to the server at the same time, the transactions would just cause them to be performed serially, and both transactions would succeed. The important part here is not losing the context of the changes.
When we talked about querying/reporting, things seemed quite a bit clearer. Datasets, or rather datatables seemed like a fine solution ��� most 3rd party controls support them out of the box. One guy mentioned that datasets performed poorly for large sets of data and that by designing custom entities for the result set, he could improve performance and memory utilization by, like, 70%. To tell you the truth, I think that if you need the performance, do it, if not, just use datasets. There isn���t much of an issue of correctness.
Just as an ending comment, in response to something someone said about scalability, I asked if they were reporting against the live OLTP data. The response was ���yes���. Well, there���s a database scalability problem if I ever saw one. OLTP works most correctly when employing transactions that have an isolation level of serializable. The problem with them is that they lock up the whole table, or get blocked when a table scan is going on. Querying often results in a table scan. You can see the problem. Anyway, a common solution to this problem is to just reduce the isolation level, a quick fix that improves performance almost immediately. You take one hit in that your reports may be showing incorrect data, especially if they do aggregate type work. You might take another hit if your OLTP transactions need to do aggregate type work themselves. That second hit is pretty much unacceptable. A different solution is to accept the fact that the heaviest querying can usually show data that isn���t up to date up to the second.
In such a solution, you would have another database for reporting. It wouldn���t be just a replica of the OLTP database, but rather a lot more denormalized ��� which is a really not nice way of saying designed for reporting. You could then move the data from your OLTP database to the reporting database in some way (more to come on this topic) and you increase the scalability of your database. Just to define that a bit better ��� your OLTP database will be able to handle more transactions per unit of time, and reports will run faster, meaning that you will both improve their latency and the number of queries that can be handled per unit of time.
Anyway, I was pretty tired after all that, but if I had to sum it up I���d say something like this: before debating solutions, define the problem, you get a lot more insight into the solutions and you get it faster. That���s just win-win all around.