Inappropriate Abstractions

A Conversation with Anders Hejlsberg, Part VI

by Bill Venners with Bruce Eckel

December 12, 2003

Summary

Anders Hejlsberg, the lead C# architect, talks with Bruce Eckel and Bill Venners about the trouble with distributed systems infrastructures that attempt to make the network transparent, and object-relational mappings that attempt to make the database invisible. The conversation is also joined by Dan Fernandez, Microsoft's Product Manager for C#, and Eric Gunnerson, C# Compiler Program Manager.

Anders Hejlsberg, a distinguished engineer at Microsoft, led the team that designed the C# (pronounced C Sharp) programming language. Hejlsberg first vaulted onto the software world stage in the early eighties by creating a Pascal compiler for MS-DOS and CP/M. A very young company called Borland soon hired Hejlsberg and bought his compiler, which was thereafter marketed as Turbo Pascal. At Borland, Hejlsberg continued to develop Turbo Pascal and eventually led the team that designed Turbo Pascal's replacement: Delphi. In 1996, after 13 years with Borland, Hejlsberg joined Microsoft, where he initially worked as an architect of Visual J++ and the Windows Foundation Classes (WFC). Subsequently, Hejlsberg was chief designer of C# and a key participant in the creation of the .NET framework. Currently, Anders Hejlsberg leads the continued development of the C# programming language.

On July 30, 2003, Bruce Eckel, author of Thinking in C++ and Thinking in Java, and Bill Venners, editor-in-chief of Artima.com, met with Anders Hejlsberg in his office at Microsoft in Redmond, Washington. In this interview, which will be published in multiple installments on Artima.com and on an audio CD-ROM to be released this fall by Bruce Eckel, Anders Hejlsberg discusses many design choices of the C# language and the .NET framework.

In Part I: The C# Design Process, Hejlsberg discusses the process used by the team that designed C#, and the relative merits of usability studies and good taste in language design.
In Part II: The Trouble with Checked Exceptions, Hejlsberg discusses versionability and scalability issues with checked exceptions.
In Part III: Delegates, Components, and Simplexity, Hejlsberg discusses delegates and C#'s first class treatment of component concepts.
In Part IV: Versioning, Virtual, and Override, Hejlsberg explains why C# instance methods are non-virtual by default and why programmers must explicitly indicate an override.
In Part V: Contracts and Interoperability, Hejlsberg discusses DLL hell and interface contracts, strong names, and the importance of interoperability.
In this sixth installment, Hejlsberg and other members of the C# team discuss the trouble with distributed systems infrastructures that attempt to make the network transparent, and object-relational mappings that attempt to make the database invisible.

In this installment, comments are also contributed by Dan Fernandez, Microsoft's Product Manager for C#, and Eric Gunnerson, C# Compiler Program Manager.

Loosely-Coupled Distributed Systems

Bill Venners: In an interview on O'Reilly, you said, "When we first sat down to design the .NET framework we took a step back and looked at what's actually happening on the Web. It's becoming this loosely connected, very distributed world, and we tried to understand what that does to your underlying programming model. And so we designed from the ground up with the assumption in place that distributed apps are built in a loosely connected, stateless fashion that gives you great scalability. You just scale out. You roll in more racks and plug them in. And once you make that fundamental assumption, it changes everything." What does it change?

Anders Hejlsberg: The prevailing wisdom five or ten years ago about how distributed systems would be built in the future was CORBA, IIOP, object request brokers. The rave at the time was to make the world look like objects, in particular, to have a bunch of infrastructure that shrouds the fact that objects are distributed. The nirvana ideal was that you could just say Object obj = CreateMeAnObject(), and then call obj.ThisMethod(), obj.ThatMethod(), and you wouldn't know if that object was over in Thailand, right next door, or in the same process. The problem with that type of programming is: it works great in a single process; it works quite well across processes; it works fairly well in a small intranet; but then it completely sucks thereafter.

If you hide the fact that messages go across a network, and don't know when they go across, you end up with chatty conversations. And all of a sudden, the speed of light can become a big problem for you. You can't engage in a conversation with an object out in New York that goes, obj.LetMeGetX(), obj.LetMeGetY(), obj.LetMeGetZ(). No, you need to say, obj.LetMeGetXYAndZ(), and have everything come back in one chunk. But you can't really do that unless you actually make people understand that they are building a distributed application. In other words, you shouldn't try to pretend that a remote object is just a local object, because there is a difference. That's one thing that works so well about web services.

Moreover, web services run on existing infrastructure that we know scales. Web services that run over HTTP are just machine-to-machine communication over precisely the same infrastructure that we all use daily when we use browsers. And we know perfectly well how to scale that. If there's anything we know about scaling up, that's it. So why not leverage that? That's what web services do. Whereas, we know precious little about how to scale CORBA systems in a geo-scalable fashion. We just don't. There's just no knowledge about it, and I've never heard of anyone being particularly successful doing it.

The Stateless Fashion

Bill Venners: When you said, "...distributed apps are built in a loosely connected, stateless fashion...", what did you mean by "stateless?"

Anders Hejlsberg: If you program in a remote-objects fashion, whenever you instantiate a new object you may actually end up holding a proxy to a remote object that truly gets instantiated in memory on some computer somewhere else in the distributed system. And as long as you hold onto the reference on this machine, you are keeping state alive on that box over there. And you can enter into very long-lived transactions--transactions that might keep state alive on that box for a long time. That's problematic in a failover scenario, and it's problematic in a scaling scenario, because you can't easily distribute. Once you pick that box, you have to go back to that box every time a call comes in.

Whereas with HTTP, you are sort of forced to reckon with the problem that HTTP is stateless. Since there is no memory in the system about the channel--the channel has no state in it--you are forced to design your system such that every incoming request can freely get routed to any CPU that can then fetch the state, jiggle it, and stuff it back. And then the next time around you can go somewhere else.

Bill Venners: It's not necessarily stateless on the server. State is being stored.

Anders Hejlsberg: There is state, but no state is being kept alive by the distributed mechanism.

Bill Venners: I'm not sure I understand what the difference is. In both cases there's state. What the advantage of not having state in the distribution mechanism?

Anders Hejlsberg: It's funny. In a sense, a web service takes away the capability for you to instantiate a new object, hold onto it, and call methods on it. A web service is just an entry point. Any state that goes in, you have to pass in. Any state that comes out, the web service passes back out to you and then forgets about it. It's not like there's some object that's inherently kept alive. There's no notion of a session.

Bill Venners: In the protocol, there's no notion of a session.

Anders Hejlsberg: Exactly.

Bill Venners: But usually there is a session on the server.

Anders Hejlsberg: Of course. But you end up designing how that session works. We don't tell you that there must be a particular kind of session concept that amounts to: the client instantiates a new object, and as long as they hold onto it, that state is alive on the server and you better figure out how to make that work.

Object-Relational Mappings

Bruce Eckel: How does the .NET framework support object persistence?

Anders Hejlsberg: There's no single approach to object persistence that satisfies everybody. Sometimes you want persistence because you're going to put an object on the wire, send it to another thread, and immediately take it off the wire. For that, you just want some sort of binary serialization, and you probably don't care about versioning. For longer term persistence, you probably want something that versions better. So you can have version 1.0 of your application write an object, and have version 2.0 read and understand that object. In that case, you're willing to trade off some representational efficiency in order to solve the versioning problem. Other times you will want to store objects in a database and query them. In that case, what you really want is an object-relational (O/R) mapping. We have one of each of those in .NET, and they are continuously evolving.

Bruce Eckel: What are the main issues with O/R mappings?

Anders Hejlsberg: All of these O/R mappings usually live and die by whether they are flexible enough in their caching policies, and most of them are not. We've actually tried hard in .NET to make sure that the caching policy can be entirely under your control, or entirely non-existent. In many cases, you just want to fire up a query and suck down the results. You'd like to use the results as objects, but you don't want the infrastructure to try to cache them so that if you ask for that object again you get the same exact object. A lot of systems can only operate that way, and as a result, they have horrible performance overheads that you often just don't need. On a middle-tier, for example, you quite often don't care about caching, because you're just serving some incoming HTTP request that's immediately going to go away thereafter. So why cache?

Bruce Eckel: So caching should be something you can ask for, but not something you're forced to use by default.

Anders Hejlsberg: Exactly. Part of the problem with most O/R mappings has been that they immediately took on the problem of caching and referential identity. If you ask for a particular customer and get back a Customer object, the next time you ask for that customer you get back exactly the same object. Well that's a tough problem. It requires a gigantic hash table that contains everything you've ever seen.

Bill Venners: Why would I care if it's exactly the same?

Anders Hejlsberg: Let's say you fetch the Customer with custID 100. Internally in an object-oriented program, if you ask for that customer in a query, and then you ask for it again later in another query, what would you expect to get the second time?

Bill Venners: A Customer that's semantically equal to the one I got the first time.

Anders Hejlsberg: Would you expect to get the same object reference?

Bill Venners: I don't see why I would care, so long as the two were semantically equal.

Anders Hejlsberg: Really? Because it has a profound difference in how your program works. Do you think of the customer as an object, of which there's only one, or do you think of the objects you operate on as copies of the database? Most O/R mappings try to give the illusion that there is just that one Customer object with custID 100, and it literally is that customer. If you get the customer and set a field on it, then you have now changed that customer. That constrasts with: you have changed this copy of the customer, but not that copy. And if two people update the customer on two copies of the object, whoever updates first, or maybe last, wins.

Bruce Eckel: Really, if you're going to all this trouble it's nice for it to be transparent.

Anders Hejlsberg: It's funny. It reminds me of the discussion we had earlier about CORBA and attempting to provide the illusion that an application is not distributed. Well, this is the same. You may want to have the illusion that the data is not in a database. You can have that illusion, but it comes at a cost.

Bruce Eckel: With CORBA, they were trying to have the illusion that there is basically no network. With Jini, they said, "No, there is a network. We have to acknowledge it at this certain level, otherwise things get excessively complicated." The trick in design is where do you make that acknowledgement? Where do you say, "Here is this boundary that we always have to see." And I think those kinds of issues exist with an O/R mapping. The challenge is figuring out what's the right abstraction.

Eric Gunnerson: The big question is: Do you need the abstraction? In a lot of cases you don't. We have something similar in our current implementation of remoting in .NET that tries to be transparent. Most people say, "Yeah, I know I'm doing remoting. I know the object lives over there. Don't go to all this effort to try and make it look like it's local."

Bruce Eckel: Sometimes you discover that if you try and use an abstraction like local-remote transparency, suddenly the complexity around it gets huge. Whereas if you just say, "I'm going to make a call here. The network may fail, and I have to acknowledge that," then things get clearer. With an object-oriented database, it seems there is that kind of choice in there as well. I have to accept that maybe I have multiple representations of the same Customer object. Maybe I have to tell the object I'm done. Maybe there has to be a transaction.

Anders Hejlsberg: And that's actually better, because then the user thinks deeply about the things that might possibly happen. As a designer, you try to give users that capability as best you can.

Bruce Eckel: And you try to put the abstraction at the right level, so that the users are not going to so much trouble to try and make things work because of the wrong abstraction.

Eric Gunnerson: The trouble with the wrong abstraction is there's no way out of it. In practice, though, it's very hard for class designers to make reasonable guesses about even the scenarios in which their designs will be used, much less the relative frequency of each kind of use. You may think your users will want transparency, because it lets them do really cool things, so you implement transparency. But if it turns out 99% of your users never care, guess what? Those people pay the tax.

Dan Fernandez: Another problem is that a lot of developers want to rubber stamp the same methods across everything. People will say, "OK there's an object-relational mapping. We're going to use it for absolutely everything in our application." It could be useful in certain places, but for something that's going to change a lot—like a stock trading system—you may not really want to have a persistence level. But you use it because think of it as the way to solve the problem. There are problems for which an object-relational mapping is the right solution, but sometimes people want to make a blanket statement that it is the right solution for every problem. That's what really hurts people.

Bruce Eckel: But you can understand why, right? The reason is that now I can learn just one persistence model and then just use it everywhere.

Dan Fernandez: Exactly.

Bruce Eckel: Maybe the answer is to have some kind of interface, and then varying implementations depending on how it's used. That way I could learn the single interface, and then either chose the implementation or have the system chose the implementation for me depending on the methods I call.

Eric Gunnerson: Of course, an interface is yet another abstraction.

Next Week

Come back Monday, December 22 for part III of a conversation with Ruby's creator Hiruhito Matzumoto. I am now staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Anders Hejlsberg will appear in the near future. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.

Resources

Deep Inside C#: An Interview with Microsoft Chief Architect Anders Hejlsberg:
http://windows.oreilly.com/news/hejlsberg_0800.html

A Comparative Overview of C#:
http://genamics.com/developer/csharp_comparative.htm

Microsoft Visual C#:
http://msdn.microsoft.com/vcsharp/

Dan Fernandez's Weblog:
http://blogs.msdn.com/danielfe/

Eric Gunnerson's Weblog:
http://blogs.msdn.com/ericgu/

Talk back!

Have an opinion? Readers have already posted 11 comments about this article. Why not add yours?

About the authors

Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.