Sponsored Link •
Anders Hejlsberg, the lead C# architect, talks with Bruce Eckel and Bill Venners about IL instructions, non-virtual methods, unsafe code, value types, and immutables.
Anders Hejlsberg, a distinguished engineer at Microsoft, led the team that designed the C# (pronounced C Sharp) programming language. Hejlsberg first vaulted onto the software world stage in the early eighties by creating a Pascal compiler for MS-DOS and CP/M. A very young company called Borland soon hired Hejlsberg and bought his compiler, which was thereafter marketed as Turbo Pascal. At Borland, Hejlsberg continued to develop Turbo Pascal and eventually led the team that designed Turbo Pascal's replacement: Delphi. In 1996, after 13 years with Borland, Hejlsberg joined Microsoft, where he initially worked as an architect of Visual J++ and the Windows Foundation Classes (WFC). Subsequently, Hejlsberg was chief designer of C# and a key participant in the creation of the .NET framework. Currently, Anders Hejlsberg leads the continued development of the C# programming language.
On July 30, 2003, Bruce Eckel, author of Thinking in C++ and Thinking in Java, and Bill Venners, editor-in-chief of Artima.com, met with Anders Hejlsberg in his office at Microsoft in Redmond, Washington. In this interview, which is being published in multiple installments on Artima.com and on an audio CD-ROM to be released by Bruce Eckel, Anders Hejlsberg discusses many design choices of the C# language and the .NET framework.
Bill Venners: One difference between Java bytecodes and IL [Intermediate Language] is that Java bytecodes
have type information embedded in the instructions, and IL does not. For example, Java has
several add instructions:
iadd adds two
fadd adds two
floats, and and
dadd adds two
doubles. IL has
add to add two
add.ovf to add two numbers and trap signed overflow, and
add.ovf.un to add two numbers and trap unsigned overflow. All of these
instructions pop two values off the top of the stack, add them, and push the result back. But
in the case of Java, the instruction indicates the type of the operands. A
floats are sitting on the top of the stack. A
means there two
longs are sitting on the top of the stack. By contrast, the
CLR's [Common Language Runtime]
add instructions are polymorphic, they add the two values on the top of
the stack, whatever their type, although the trap overflow versions differentiate between
signed and unsigned. Basically, the engine running IL code must keep track of the types of
the values on the stack, so when it encounters an
add, it knows which kind of
addition to perform.
I read that Microsoft decided that IL will always be compiled, never interpreted. How does encoding type information in instructions help interpreters run more efficiently?
Anders Hejlsberg: If an interpreter can just blindly do what the instructions say
without needing to track what's at the top of the stack, it can go faster. When it sees an
iadd, for example, the interpreter doesn't first have to figure out which kind of
add it is, it knows it's an integer add. Assuming someone has already verified that the stack
looks correct, it's safe to cut some time there, and you care about that for an interpreter. In
our case, though, we never intended to target an interpreted scenario with the CLR. We
intended to always JIT [Just-in-time compile], and for the purposes of the JIT, we needed to track the type
information anyway. Since we already have the type information, it doesn't actually buy us
anything to put it in the instructions.
Bill Venners: Many modern JVMs [Java virtual machines] do adaptive optimization, where they start by interpreting bytecodes. They profile the app as it runs to find the 10% to 20% of the code that is executed 80% to 90% of the time, then they compile that to native. They don't necessarily just-in-time compile those bytecodes, though. A method's bytecodes can still be executed by the interpreter as they are being compiled to native and optimized in the background. When native code is ready, it can replace the bytecodes. By not targeting an interpreted scenario, have you completely ruled out that approach to execution in a CLR?
Anders Hejlsberg: No, we haven't completely ruled that out. We can still interpret. We're just not optimized for interpreting. We're not optimized for writing that highest performance interpreter that will only ever interpret. I don't think anyone does that anymore. For a set top box 10 years ago, that might have been interesting. But it's no longer interesting. JIT technologies have gotten to the point where you can have multiple possible JIT strategies. You can even imagine using a fast JIT that just rips quickly, and then when we discover that we're executing a particular method all the time, using another JIT that spends a little more time and does a better job of optimizing. There's so much more you can do JIT-wise.
Bill Venners: When I asked you earlier (In Part IV) about why non-virtual methods are the default in C#, one of your reasons was performance. You said:
We can observe that as people write code in Java, they forget to mark their methods final. Therefore, those methods are virtual. Because they're virtual, they don't perform as well. There's just performance overhead associated with being a virtual method.
Another thing that happens in the adaptive optimizing JVMs is they'll inline virtual method invocations, because a lot of times only one or two implementations are actually being used.
Anders Hejlsberg: They can never inline a virtual method invocation.
Bill Venners: My understanding is that these JVM's first check if the type of the object on which a virtual method call is about to be made is the same as the one or two they expect, and if so, they can just plow on ahead through the inlined code.
Anders Hejlsberg: Oh, yes. You can optimize for the case you saw last time and check whether it is the same as the last one, and then you just jump straight there. But there's always some overhead, though you can bring the overhead down to fairly minimum.