The Artima Developer Community
Sponsored Link

CLR Design Choices
A Conversation with Anders Hejlsberg, Part VIII
by Bill Venners with Bruce Eckel
February 2, 2004
Summary
Anders Hejlsberg, the lead C# architect, talks with Bruce Eckel and Bill Venners about IL instructions, non-virtual methods, unsafe code, value types, and immutables.

Anders Hejlsberg, a distinguished engineer at Microsoft, led the team that designed the C# (pronounced C Sharp) programming language. Hejlsberg first vaulted onto the software world stage in the early eighties by creating a Pascal compiler for MS-DOS and CP/M. A very young company called Borland soon hired Hejlsberg and bought his compiler, which was thereafter marketed as Turbo Pascal. At Borland, Hejlsberg continued to develop Turbo Pascal and eventually led the team that designed Turbo Pascal's replacement: Delphi. In 1996, after 13 years with Borland, Hejlsberg joined Microsoft, where he initially worked as an architect of Visual J++ and the Windows Foundation Classes (WFC). Subsequently, Hejlsberg was chief designer of C# and a key participant in the creation of the .NET framework. Currently, Anders Hejlsberg leads the continued development of the C# programming language.

On July 30, 2003, Bruce Eckel, author of Thinking in C++ and Thinking in Java, and Bill Venners, editor-in-chief of Artima.com, met with Anders Hejlsberg in his office at Microsoft in Redmond, Washington. In this interview, which is being published in multiple installments on Artima.com and on an audio CD-ROM to be released by Bruce Eckel, Anders Hejlsberg discusses many design choices of the C# language and the .NET framework.

Interpreting and Adaptive Optimizations

Bill Venners: One difference between Java bytecodes and IL [Intermediate Language] is that Java bytecodes have type information embedded in the instructions, and IL does not. For example, Java has several add instructions: iadd adds two ints, ladd adds two longs, fadd adds two floats, and and dadd adds two doubles. IL has add to add two numbers, add.ovf to add two numbers and trap signed overflow, and add.ovf.un to add two numbers and trap unsigned overflow. All of these instructions pop two values off the top of the stack, add them, and push the result back. But in the case of Java, the instruction indicates the type of the operands. A fadd means two floats are sitting on the top of the stack. A ladd means there two longs are sitting on the top of the stack. By contrast, the CLR's [Common Language Runtime] add instructions are polymorphic, they add the two values on the top of the stack, whatever their type, although the trap overflow versions differentiate between signed and unsigned. Basically, the engine running IL code must keep track of the types of the values on the stack, so when it encounters an add, it knows which kind of addition to perform.

I read that Microsoft decided that IL will always be compiled, never interpreted. How does encoding type information in instructions help interpreters run more efficiently?

Anders Hejlsberg: If an interpreter can just blindly do what the instructions say without needing to track what's at the top of the stack, it can go faster. When it sees an iadd, for example, the interpreter doesn't first have to figure out which kind of add it is, it knows it's an integer add. Assuming someone has already verified that the stack looks correct, it's safe to cut some time there, and you care about that for an interpreter. In our case, though, we never intended to target an interpreted scenario with the CLR. We intended to always JIT [Just-in-time compile], and for the purposes of the JIT, we needed to track the type information anyway. Since we already have the type information, it doesn't actually buy us anything to put it in the instructions.

Bill Venners: Many modern JVMs [Java virtual machines] do adaptive optimization, where they start by interpreting bytecodes. They profile the app as it runs to find the 10% to 20% of the code that is executed 80% to 90% of the time, then they compile that to native. They don't necessarily just-in-time compile those bytecodes, though. A method's bytecodes can still be executed by the interpreter as they are being compiled to native and optimized in the background. When native code is ready, it can replace the bytecodes. By not targeting an interpreted scenario, have you completely ruled out that approach to execution in a CLR?

Anders Hejlsberg: No, we haven't completely ruled that out. We can still interpret. We're just not optimized for interpreting. We're not optimized for writing that highest performance interpreter that will only ever interpret. I don't think anyone does that anymore. For a set top box 10 years ago, that might have been interesting. But it's no longer interesting. JIT technologies have gotten to the point where you can have multiple possible JIT strategies. You can even imagine using a fast JIT that just rips quickly, and then when we discover that we're executing a particular method all the time, using another JIT that spends a little more time and does a better job of optimizing. There's so much more you can do JIT-wise.

Bill Venners: When I asked you earlier (In Part IV) about why non-virtual methods are the default in C#, one of your reasons was performance. You said:

We can observe that as people write code in Java, they forget to mark their methods final. Therefore, those methods are virtual. Because they're virtual, they don't perform as well. There's just performance overhead associated with being a virtual method.

Another thing that happens in the adaptive optimizing JVMs is they'll inline virtual method invocations, because a lot of times only one or two implementations are actually being used.

Anders Hejlsberg: They can never inline a virtual method invocation.

Bill Venners: My understanding is that these JVM's first check if the type of the object on which a virtual method call is about to be made is the same as the one or two they expect, and if so, they can just plow on ahead through the inlined code.

Anders Hejlsberg: Oh, yes. You can optimize for the case you saw last time and check whether it is the same as the last one, and then you just jump straight there. But there's always some overhead, though you can bring the overhead down to fairly minimum.

Unsafe Code in C# and the CLR

Bill Venners: The CLR has IL instructions, and C# has syntax, for unsafe activities such as pointer arithmetic. By contrast, Java's bytecodes and syntax has no support for unsafe activities. When you want to do something unsafe with a JVM, Java basically forces you to write C code and use the Java Native Interface (JNI). Why did you decide to make it possible to express unsafe code in IL and C#?

Anders Hejlsberg: The irony is that although there have been all kinds of debate and writing about how C# has unsafe code and "Oh my God, it is badness," the funny thing is that unsafe code is a lot safer than any kind of code you would ever do with JNI. Because in C#, unsafe code is integrated with the language and everybody understands what's going on.

First of all let's just immediately do away with the notion that there is a security hole with unsafe code, because unsafe code never runs in an untrusted environment, just like JNI code never runs in an untrusted environment. The right way to think about unsafe code is that it takes the capabilities of JNI and integrates them into the programming language. That makes it easier, and therefore less error prone, and therefore less unsafe, to write code for interoperating with the outside world.

Bruce Eckel: Are you sorry you called it unsafe?

Anders Hejlsberg: No. I think you should call a spade a spade. It is unsafe, right?

Bill Venners: Are the marketing people sorry?

Anders Hejlsberg: Oh yeah. And we actually had those discussions. They said, "Oh, can't you call it..."

Bill Venners: Special code.

Bruce Eckel: Put a positive spin on it.

Anders Hejlsberg: We said no. We stood our ground and said, "No, it's unsafe. Let's call it unsafe," because we wanted it to stand out. If you can avoid writing unsafe code, you should. Sometimes you do need to write it, and then we want it to be clear in your code precisely where you wrote it. You can always search for the word unsafe in your code and find all those places.

Bill Venners: Your point is that the unsafe code approach, because it is less error prone than the JNI approach, is actually safer.

Anders Hejlsberg: Yes, and honestly I think experience bears us out too. People have a lot of problems writing JNI code.

Value Types

Bill Venners: C# and the CLR support value types, which can exist both as values on the stack and objects on the heap. By contrast, Java has separate primitive types and wrapper types. In the design of C# and the CLR, to what extent were value types a performance consideration versus a usability consideration?

Anders Hejlsberg: There is clearly a performance aspect to it. One possible solution would be to say, "There are no value types. All types are heap allocated. Now we have representational identity, and so we're done, right?" Except it performs like crap. We know that from Smalltalk systems that did it that way, so something better is needed.

Over time, we've seen two schools of thought. Either you are very object-oriented, pay the performance penalty, and get those capabilities; or you bifurcate a type system, like Java and C++. In a bifurcated type system, you have primitives, which are endowed with special capabilities, and the user- extensible realm of classes, in which you don't get to do certain things. And there's no über-type for everything. The notion that you can treat any piece of data as an object seems so benign. What's the big deal? When you can't treat ints as primitives, you can just use a wrapper type that has an identity. That's true, but all that manual wrapping is irritating and gets in your way.

The way we implemented it in C# and the CLR, I think we get to have our cake and eat it too. Value types are just as efficient as Java or C++ primitives, as long as you treat them as values. Only if you try to treat them as objects do they become heap allocated objects on demand, through boxing and unboxing. It gives you this beauty and simplicity.

Immutables

Bill Venners: In addition to being a C# and CLR construct, the value type is a general object-oriented programming concept. Another such concept is immutable types.

When I went to the first JavaOne back in 1996, everyone seemed to have one complaint about what they missed from C++ that Java didn't have. Different people had different complaints, but they all seemed to have at least one complaint. My complaint was const. I really liked what I could do with const in C++, though somehow I have gotten along in Java just fine without it.

Did you consider including support for the concept of immutable directly in C# and the CLR?

Anders Hejlsberg: There are two questions in here. With respect to immutability, it's tricky because what you're saying when you say something is immutable, is that from an external perspective, I cannot observe any mutation. That doesn't necessarily mean that it doesn't have a cache inside that makes it go more efficiently. It's just on the outside it looks immutable. That's hard for a compiler to figure out. We could certainly have a rule that says you can only modify the fields of this object in the constructor. And we could make certain usability guarantees. But it actually rules out some scenarios that are in use. So we haven't codified immutability as a hard guarantee, because it's a hard guarantee to make. The concept of an immutable object is very useful, but it's just up to the author to say that it's immutable.

Bill Venners: Immutability is part of the semantics of the class.

Anders Hejlsberg: Yes. With respect to const, it's interesting, because we hear that complaint all the time too: "Why don't you have const?" Implicit in the question is, "Why don't you have const that is enforced by the runtime?" That's really what people are asking, although they don't come out and say it that way.

The reason that const works in C++ is because you can cast it away. If you couldn't cast it away, then your world would suck. If you declare a method that takes a const Bla, you could pass it a non-const Bla. But if it's the other way around you can't. If you declare a method that takes a non-const Bla, you can't pass it a const Bla. So now you're stuck. So you gradually need a const version of everything that isn't const, and you end up with a shadow world. In C++ you get away with it, because as with anything in C++ it is purely optional whether you want this check or not. You can just whack the constness away if you don't like it.

Next Week

Come back Monday, February 10 for an interview with Eric Gunnerson, C# Compiler Program Manager. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.

Talk Back!

Have an opinion about the design principles presented in this article? Discuss this article in the Articles Forum topic, CLR Design Choices.

Resources

Deep Inside C#: An Interview with Microsoft Chief Architect Anders Hejlsberg:
http://windows.oreilly.com/news/hejlsberg_0800.html

A Comparative Overview of C#:
http://genamics.com/developer/csharp_comparative.htm

Microsoft Visual C#:
http://msdn.microsoft.com/vcsharp/

Dan Fernandez's Weblog:
http://blogs.gotdotnet.com/danielfe/

Eric Gunnerson's Weblog:
http://blogs.gotdotnet.com/ericgu/


Sponsored Links

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us