The Artima Developer Community
Sponsored Link

Generics in C#, Java, and C++
A Conversation with Anders Hejlsberg, Part VII
by Bill Venners with Bruce Eckel
January 26, 2004

<<  Page 2 of 3  >>

Advertisement

Generics in C#

Bill Venners: How do generics work in C#?

Anders Hejlsberg: In C# without generics, you are basically able to say class List {...}. In C# with generics, you can say class List<T> {...}, where T is the type parameter. Within List<T> you can use T as if it were a type. When it actually comes time to create a List object, you say List<int> or List<Customer>. You construct new types from that List<T>, and it is truly as if your type arguments get substituted for the type parameter. All of the Ts become ints or Customers, you don't have to downcast, and there is strong type checking everywhere.

In the CLR [Common Language Runtime], when you compile List<T>, or any other generic type, it compiles down to IL [Intermediate Language] and metadata just like any normal type. The IL and metadata contains additional information that knows there's a type parameter, of course, but in principle, a generic type compiles just the way that any other type would compile. At runtime, when your application makes its first reference to List<int>, the system looks to see if anyone already asked for List<int>. If not, it feeds into the JIT the IL and metadata for List<T> and the type argument int. The JITer, in the process of JITing the IL, also substitutes the type parameter.

Bruce Eckel: So it's instantiating at runtime.

Anders Hejlsberg: It's instantiating at runtime, exactly. It's producing native code specifically for that type at the point it is needed. And literally when you say List<int>, you will get a List of int. If the code in the generic type uses an array of T, that becomes an array of int.

Bruce Eckel: Does it garbage collect that class at some point?

Anders Hejlsberg: Yes and no, but that's an orthogonal issue. It creates the class in that app domain, and then the class lives forever in that app domain. If you kill the app domain, the class goes away, like any other class.

Bruce Eckel: But if I have an application that uses a List<int> and a List<Cat>, but it never goes down the branch that uses List<Cat>,...

Anders Hejlsberg: ...then the system won't instantiate a List<Cat>. Now, there are exceptions to that rule. If you're NGENing an image, that is, if you're generating a native image up front, you can generate instantiations early. But if you're running under normal circumstances, the instantiations are purely demand driven, and they are deferred to as late as possible.

Now, what we then do is for all type instantiations that are value types—such as List<int>, List<long>, List<double>, List<float>—we create a unique copy of the executable native code. So List<int> gets its own code. List<long> gets its own code. List<float> gets its own code. For all reference types we share the code, because they are representationally identical. It's just pointers.

Bruce Eckel: And you just need to cast.

Anders Hejlsberg: No, you don't actually. We can share the native image, but they actually have separate VTables. I'm just pointing out that we do fairly aggressive code sharing where it makes sense, but we are also very conscious about not sharing where you want the performance. Typically with value types, you really do care that List<int> is int. You don't want them to be boxed as Objects. Boxing value types is one way we could share, but boy it would be an expensive way.

Bill Venners: In the reference case, you actually have different classes. List<Elephant> is different from List<Orangutan>, but they actually share all the same method code.

Anders Hejlsberg: Yes. As an implementation detail, they actually share the same native code.

Comparing C# and Java Generics

Bruce Eckel: How do C# generics compare with Java generics?

Anders Hejlsberg: Java's generics implementation was based on a project originally called Pizza, which was done by Martin Odersky and others. Pizza was renamed GJ, then it turned into a JSR and ended up being adopted into the Java language. And this particular generics proposal had as a key design goal that it could run on an unmodified VM [Virtual Machine]. It is, of course, great that you don't have to modify your VM, but it also brings about a whole bunch of odd limitations. The limitations are not necessarily directly apparent, but you very quickly go, "Hmm, that's strange."

For example, with Java generics, you don't actually get any of the execution efficiency that I talked about, because when you compile a generic class in Java, the compiler takes away the type parameter and substitutes Object everywhere. So the compiled image for List<T> is like a List where you use the type Object everywhere. Of course, if you now try to make a List<int>, you get boxing of all the ints. So there's a bunch of overhead there. Furthermore, to keep the VM happy, the compiler actually has to insert all of the type casts you didn't write. If it's a List of Object and you're trying to treat those Objects as Customers, at some point the Objects must be cast to Customers to keep the verifier happy. And really all they're doing in their implementation is automatically inserting those type casts for you. So you get the syntactic sugar, or some of it at least, but you don't get any of the execution efficiency. So that's issue number one I have with Java's solution.

Issue number two, and I think this is probably an even bigger issue, is that because Java's generics implementation relies on erasure of the type parameter, when you get to runtime, you don't actually have a faithful representation of what you had at compile time. When you apply reflection to a generic List in Java, you can't tell what the List is a List of. It's just a List. Because you've lost the type information, any type of dynamic code-generation scenario, or reflection-based scenario, simply doesn't work. If there's one trend that's pretty clear to me, it's that there's more and more of that. And it just doesn't work, because you've lost the type information. Whereas in our implementation, all of that information is available. You can use reflection to get the System.Type for object List<T>. You cannot actually create an instance of it yet, because you don't know what T is. But then you can use reflection to get the System.Type for int. You can then ask reflection to please put these two together and create a List<int>, and you get another System.Type for List<int>. So representationally, anything you can do at compile time you can also do at runtime.

Comparing C# Generics to C++ Templates

Bruce Eckel: How do C# generics compare with C++ templates?

Anders Hejlsberg: To me the best way to understand the distinction between C# generics and C++ templates is this: C# generics are really just like classes, except they have a type parameter. C++ templates are really just like macros, except they look like classes.

The big difference between C# generics and C++ templates shows up in when the type checking occurs and how the instantiation occurs. First of all, C# does the instantiation at runtime. C++ does it at compile time, or perhaps at link time. But regardless, the instantiation happens in C++ before the program runs. That's difference number one. Difference number two is C# does strong type checking when you compile the generic type. For an unconstrained type parameter, like List<T>, the only methods available on values of type T are those that are found on type Object, because those are the only methods we can generally guarantee will exist. So in C# generics, we guarantee that any operation you do on a type parameter will succeed.

C++ is the opposite. In C++, you can do anything you damn well please on a variable of a type parameter type. But then once you instantiate it, it may not work, and you'll get some cryptic error messages. For example, if you have a type parameter T, and variables x and y of type T, and you say x + y, well you had better have an operator+ defined for + of two Ts, or you'll get some cryptic error message. So in a sense, C++ templates are actually untyped, or loosely typed. Whereas C# generics are strongly typed.

<<  Page 2 of 3  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us