Sponsored Link •
Bruce Eckel: How does the .NET framework support object persistence?
Anders Hejlsberg: There's no single approach to object persistence that satisfies everybody. Sometimes you want persistence because you're going to put an object on the wire, send it to another thread, and immediately take it off the wire. For that, you just want some sort of binary serialization, and you probably don't care about versioning. For longer term persistence, you probably want something that versions better. So you can have version 1.0 of your application write an object, and have version 2.0 read and understand that object. In that case, you're willing to trade off some representational efficiency in order to solve the versioning problem. Other times you will want to store objects in a database and query them. In that case, what you really want is an object-relational (O/R) mapping. We have one of each of those in .NET, and they are continuously evolving.
Bruce Eckel: What are the main issues with O/R mappings?
Anders Hejlsberg: All of these O/R mappings usually live and die by whether they are flexible enough in their caching policies, and most of them are not. We've actually tried hard in .NET to make sure that the caching policy can be entirely under your control, or entirely non-existent. In many cases, you just want to fire up a query and suck down the results. You'd like to use the results as objects, but you don't want the infrastructure to try to cache them so that if you ask for that object again you get the same exact object. A lot of systems can only operate that way, and as a result, they have horrible performance overheads that you often just don't need. On a middle-tier, for example, you quite often don't care about caching, because you're just serving some incoming HTTP request that's immediately going to go away thereafter. So why cache?
Bruce Eckel: So caching should be something you can ask for, but not something you're forced to use by default.
Anders Hejlsberg: Exactly. Part of the problem with most O/R mappings has
been that they immediately took on the problem of caching and referential identity. If you
ask for a particular customer and get back a
Customer object, the next time
you ask for that customer you get back exactly the same object. Well that's a tough
problem. It requires a gigantic hash table that contains everything you've ever seen.
Bill Venners: Why would I care if it's exactly the same?
Anders Hejlsberg: Let's say you fetch the
custID 100. Internally in an object-oriented program, if you ask for that
customer in a query, and then you ask for it again later in another query, what would you
expect to get the second time?
Bill Venners: A
Customer that's semantically equal to the one I
got the first time.
Anders Hejlsberg: Would you expect to get the same object reference?
Bill Venners: I don't see why I would care, so long as the two were semantically equal.
Anders Hejlsberg: Really? Because it has a profound difference in how your program
works. Do you think of the customer as an object, of which there's only one, or do you
think of the objects you operate on as copies of the database? Most O/R
mappings try to give the illusion that there is just that one
custID 100, and it literally is that customer. If
you get the customer and set a field on it, then you have now changed that customer. That
constrasts with: you have changed this copy of the customer, but not that copy. And if
two people update the customer on two copies of the object, whoever updates first, or
maybe last, wins.
Bruce Eckel: Really, if you're going to all this trouble it's nice for it to be transparent.
Anders Hejlsberg: It's funny. It reminds me of the discussion we had earlier about CORBA and attempting to provide the illusion that an application is not distributed. Well, this is the same. You may want to have the illusion that the data is not in a database. You can have that illusion, but it comes at a cost.
Bruce Eckel: With CORBA, they were trying to have the illusion that there is basically no network. With Jini, they said, "No, there is a network. We have to acknowledge it at this certain level, otherwise things get excessively complicated." The trick in design is where do you make that acknowledgement? Where do you say, "Here is this boundary that we always have to see." And I think those kinds of issues exist with an O/R mapping. The challenge is figuring out what's the right abstraction.
Eric Gunnerson: The big question is: Do you need the abstraction? In a lot of cases you don't. We have something similar in our current implementation of remoting in .NET that tries to be transparent. Most people say, "Yeah, I know I'm doing remoting. I know the object lives over there. Don't go to all this effort to try and make it look like it's local."
Bruce Eckel: Sometimes you discover that if you try and use an abstraction like
local-remote transparency, suddenly the complexity around it gets huge. Whereas if you
just say, "I'm going to make a call here. The network may fail, and I have to acknowledge
that," then things get clearer. With an object-oriented database, it seems there is that kind
of choice in there as well. I have to accept that maybe I have multiple representations of
Customer object. Maybe I have to tell the object I'm done.
Maybe there has to be a transaction.
Anders Hejlsberg: And that's actually better, because then the user thinks deeply about the things that might possibly happen. As a designer, you try to give users that capability as best you can.
Bruce Eckel: And you try to put the abstraction at the right level, so that the users are not going to so much trouble to try and make things work because of the wrong abstraction.
Eric Gunnerson: The trouble with the wrong abstraction is there's no way out of it. In practice, though, it's very hard for class designers to make reasonable guesses about even the scenarios in which their designs will be used, much less the relative frequency of each kind of use. You may think your users will want transparency, because it lets them do really cool things, so you implement transparency. But if it turns out 99% of your users never care, guess what? Those people pay the tax.
Dan Fernandez: Another problem is that a lot of developers want to rubber stamp the same methods across everything. People will say, "OK there's an object-relational mapping. We're going to use it for absolutely everything in our application." It could be useful in certain places, but for something that's going to change a lot—like a stock trading system—you may not really want to have a persistence level. But you use it because think of it as the way to solve the problem. There are problems for which an object-relational mapping is the right solution, but sometimes people want to make a blanket statement that it is the right solution for every problem. That's what really hurts people.
Bruce Eckel: But you can understand why, right? The reason is that now I can learn just one persistence model and then just use it everywhere.
Dan Fernandez: Exactly.
Bruce Eckel: Maybe the answer is to have some kind of interface, and then varying implementations depending on how it's used. That way I could learn the single interface, and then either chose the implementation or have the system chose the implementation for me depending on the methods I call.
Eric Gunnerson: Of course, an interface is yet another abstraction.
Come back Monday, December 22 for part III of a conversation with Ruby's creator Hiruhito Matzumoto. I am now staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Anders Hejlsberg will appear in the near future. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Deep Inside C#: An Interview with Microsoft Chief Architect Anders Hejlsberg:
A Comparative Overview of C#:
Microsoft Visual C#:
Dan Fernandez's Weblog:
Eric Gunnerson's Weblog: