Sponsored Link •
Bill Venners: In the static versus dynamic typing debate, the proponents of strong typing often claim that although a dynamically typed language can help you whip up a prototype very quickly, to build a robust system you need a statically typed language. By contrast, the main message about static typing that I've gotten from you in your talks and writings has been that static typing can help an optimizer work more effectively. In your view, what are the benefits of static typing, both in C++ and in general?
Bjarne Stroustrup: There are a couple of benefits. First, I think you can understand things better in a statically typed program. If we can say there are certain operations you can do on an integer, and this is an integer, then we can know exactly what's going on.
Bill Venners: When you say we know what's going on, do you mean programmers or compilers?
Bjarne Stroustrup: Programmers. I do tend to anthropromorphize, though.
Bill Venners: Anthropromorphize programmers?
Bjarne Stroustrup: Anthropomorphize compilers. I tend to do that partly because it's tempting, and partly because I've written compilers. So as programmers, I feel we can better understand what goes on with a statically typed language.
In a dynamically typed language, you do an operation and basically hope the object is of the type where the operation makes some sense, otherwise you have to deal with the problem at runtime. Now, that may be a very good way to find out if your program works if you are sitting at a terminal debugging your code. There are nice quick response times, and if you do an operation that doesn't work, you find yourself in the debugger. That's fine. If you can find all the bugs, that's fine when it's just the programmer working�but for a lot of real programs, you can't find all the bugs that way. If bugs show up when no programmer is present, then you have a problem. I've done a lot of work with programs that should run in places like telephone switches. In such environments, it's very important that unexpected things don't happen. The same is true in most embedded systems. In these environments, there's nobody who can understand what to do if a bug sends them into a debugger.
With static typing, I find it easier to write the code. I find it easier to
understand the code. I find it easier to understand other people's code, because
the things they tried to say are expressed in something with a well-defined
semantics in the language. For example, if I specify my function takes an argument
Temperature_reading then a user does not have to look at my code to determine what
kind of object I need, looking at the interface will do. I don't need to check if the user gave me the wrong
kind of object, because the compiler will reject any argument that is not a
Temperature_reading. I can directly use my argument as
Temperature_reading without applying any type of cast.
I also find that developing those statically typed interfaces is a good
exercise. If forces me to think about what is essential, rather than just
letting anything remotely plausible through as arguments and return values,
hoping that the caller and the callee will agree and
that both will write the necessary runtime checks.
To quote Kristen Nygaard, programming is understanding. The meaning is: if
you don't understand something, you can't code it, and you gain understading trying to code it.
That's the foreword vignette in my third
edition of The C++ Programming Language. That is pretty fundamental,
and I think it's much easier to read a piece of code where you know you have a
vector of integers rather than a pointer to an object. Sure, you can ask whether
the object is a
vector, and if so you can ask if it holds integers. Or perhaps
it holds some integers, some strings, and some shapes. If you want such
containers you can build them, but I think you should prefer homogeneous
vectors that hold a specific type as opposed to a generic collection of
generic objects. Why? It's really a variant of the argument for preferring
statically checked interfaces. If I have a
vector<Apple>, then I know that
its elements are
Apples. I don't have to cast an
Object to an
Apple to use it,
and I don't have to fear that you have treated my
vector as a
and snuck a
Pear into it, or treated it as an
vector<Object> and stuck an
HydraulicPumpInterface in there. I thought that was pretty well understood by now.
Even Java and C# are about to provide generic mechanisms to support that.
On the other hand, you can't build a system that is completely statically
typed, because you would have to deploy the whole system compiled as one unit
that never changes. The benefits of more dynamic techniques like virtual
functions are that you can connect to something you don't quite know enough about
to do complete static type checking. Then, you can check what interfaces it has using whatever
initial interfaces you know. You can ask an object a few questions and then start using
it based on the answers. The question is along the lines of, "Are you something that obeys the
Shape interface?" If you get yes, you start applying
Shape operations to it. If you get no, you say, "Oops," and you
deal with it. The C++ mechanism for that is
dynamic_cast. This "questioning" using
contrasts with dynamically typed languages, where
you tend to just start applying the operations. If it doesn't work, you say,
"Oops." Often, that oops happens in the middle of a computation as opposed to the point when the object becomes known
to you. It's harder to deal with a later oops.
Also, the benefits to the compiler in terms of optimization can be huge. The difference between dynamically and a statically typed and resolved operation can easily be times 50. When I talk about efficiencies, I like to talk about factors, because that's where you can really see a difference.
Bill Venners: Factors?
Bjarne Stroustrup: When you get to percents, 10%, 50%, and such, you start arguing whether efficiency matters, whether next year's machine will be the right solution rather than optimization. But in terms of dynamic versus static, we're talking factors: times 3, times 5, times 10, times 50. I think a fair bit about real-time problems that have to be done on big computers, where a factor of 10 or even a factor of 2 times is the difference between success and failure.
Bill Venners: You're not just talking about dynamic versus static method invocation. You're talking about optimization, right? The optimizer has more information and can do a better job.
Bjarne Stroustrup: Yes.
Bill Venners: How does that work? How does an optimizer use type information to do a better job of optimizing?
Bjarne Stroustrup: Let's take a very simple case. C++ has both statically and dynamically bound member functions. If you do a virtual function call, it's an indirect function call. If it's statically bound, it's a perfectly ordinary function call. An indirect function call is probably 25% more expensive these days. That's not such a big deal. But if it's a really small function that does something like a less-than operation on an integer, the relative cost of a function call is huge, because there's more code to be executed. You have to do the function preamble. You have to do the operation. You have to do the postamble, if there is such a thing. In the process of doing all that, you have to get more instructions loaded into the machine. You break the pipelines, especially if it's an indirect function call. So you get one of these 10 to 30 factors for how to do a less-than. If such a difference occurs in a critical inner loop, the difference becomes significant. That was how the C++ sort beat the C sort. The C sort passed a function to be called indirectly. The C++ version passed a function object, where you had a statically bound inline function that degenerated into a less than.