The Artima Developer Community
The C++ Source | C++ Community News | Discuss | Print | Email | First Page | Previous | Next
Sponsored Link

C++ Source
On the Tension Between Object-Oriented and Generic Programming in C++
and What Type Erasure Can Do About It
by Thomas Becker
October 15, 2007

Page 1 of 2  >>

Advertisement

Summary
The author discusses how the use of generic programming in C++ can lead to conflicts with object-oriented design principles. He demonstrates how a technique known as type erasure can often be used to resolve these conflicts. An in-depth example is presented: any_iterator, a type-safe, heterogeneous C++ iterator.

In his glossary of terms[1], Bjarne Stroustrup has described the C++ programming language that he created as "a general-purpose programming language [...] that supports procedural programming, data abstraction, object-oriented programming, and generic programming." The fact that C++ supports these different programming paradigms makes it unique—and uniquely powerful—among today's programming languages. On the other hand, it should not come as a surprise that the close coexistence of such vastly different paradigms can cause considerable friction, especially in large software systems.

In this article, I will focus on the tension that can occur when object-oriented programming (classes, objects, and runtime polymorphism come to mind) meets generic programming (algorithms, templates, and compile time polymorphism come to mind).

The article consists of two parts. In the first part, I will demonstrate how the coexistence of OO and generic programming can cause serious friction in real-life software engineering. I will then explain how a technique known as type erasure can be used to alleviate these problems.

The second part explains how type erasure can be implemented in C++. Specifically, I will elaborate on an example used in the first part, namely, C++ iterator type erasure. I will discuss the design and implementation of a class template[2] any_iterator that provides type erasure for C++ iterators.

The Trouble with Object-Oriented and Generic Programming

A Little Trivia Quiz

Let us start with a little trivia quiz. Who said the following things about object-oriented programming?

"I find OOP technically unsound."

"I find OOP philosophically unsound."

"I find OOP methodologically wrong."

"I have yet to see an interesting piece of code that comes from these OO people."

"I think that object orientedness is almost as much of a hoax as artificial intelligence."

All the quotes above are from an interview with Alexander Stepanov[3], the inventor of the STL and elder statesman of generic programming. As a practicing software engineer who works on large commercial software projects, I know better than to hold such a negative view of OO programming. But when someone like Alexander Stepanov says such a thing, then I don't think it should be taken lightly.

My experience as a software engineer in the trenches has taught me that there is much more tension, if not contradiction or incompatibility, between OO programming and generic programming than many people care to admit. It is easy to dismiss Alexander Stepanov's rejection of OO programming as extreme and unrealistic. It is much harder to make the OO and generic programming paradigms coexist and cooperate in real-life software engineering.

In the next three sections, I will illustrate the problem with an example from the real world, and I will suggest a less radical remedy than to disavow OO programming as a tool in software design altogether.

An Example from the Real World

Suppose you have a class number_cruncher that calculates and stores a sequence of numbers. Further, suppose that number_cruncher wants to expose to its clients a way to iterate over the stored numbers and retrieve them.

In the old days, predating the STL, the interface of number_cruncher might have exposed a callback for retrieving the numbers, or perhaps some form of get_first()/get_next() idiom. In an evironment where the STL and generic programming are used, such an interface would raise more than a few eyebrows. Here, the interface of number_cruncher will expose a pair of iterators, like this:

class number_cruncher
{   
public:
  typedef std::vector<double>::const_iterator const_iterator;
  const_iterator begin() const;
  const_iterator end() const;
  [...]
};

Now suppose you change the type of the collection which internally holds the items from std::vector to some other kind of vector. (Yes, there are other kinds of vectors.) In the sense of OO programming, what have you just done? You have changed an implementation detail. And what happens? The world recompiles. By using a typedef for the iterator that you expose, you were able to ensure that your clients did not have to change their code. Nonetheless, their code depends on the type of the iterator, and thus on the type of the collection that you use to store your numbers. In the pre-STL design mentioned earlier, this dependency can easily be eliminated, e.g., by using the pimpl idiom, or by exposing the number cruncher's interface via an abstract base class. This option is now gone. By using the typedef for the iterators, you have introduced a compile-time dependency that cannot be eliminated. Depending on the context in which your class is used, that may or may not be a serious issue.

But it gets worse. Suppose that the implementation of number_cruncher changes in such a way that there are frequent insertions to the internal collection of numbers. Having anticipated a change of that kind, you have carefully specified number_cruncher's interface to state that it will expose a pair of input iterators, dereferencing to something that converts to a double, with incrementing and dereferencing both having O(1) complexity. Therefore, you are within your rights when you change the internal type of your collection from std::vector<double> to std::list<double>. Again, your clients will have to recompile. But it is also possible that before, when you were exposing vector iterators, some of your clients may have said, gee, I wonder how many elements there are in this number cruncher. Let me do:

size_t numItems = 
  my_number_cruncher.end() - my_number_cruncher.begin();

You did not mean for them to do that, but you could not prevent it either. Now, after you changed an implementation detail of number_cruncher, not only will the world recompile, the world will find that it has a broken build. By exposing functionality that you neither needed nor wanted to expose, you have broken encapsulation.

Well, that's not really an issue, you might say, because if your clients are STL savvy, as they should be, then they were probably using std::distance to get the number of elements in the number_cruncher:

size_t numItems = std::distance(
  my_number_cruncher.begin(),
  my_number_cruncher.end()
);

If they were doing that, then your implementation change from vector to list causes the semantics of your number cruncher to change: the cost of getting the number of elements changes from constant to linear. This is probably even worse than the broken build, because it may not be noticed until way, way later, when something goes horribly awry in the field. You never meant for anybody to get the number of elements in constant time, but you could not prevent it.

Perhaps you think that none of these problems so far are very serious or very likely to happen in practice. Well, ok, I will admit that I made all this up. I have never changed an std::vector to a different kind of vector, and changing an std::vector to an std::list is extremely rare and has never caused me the kind of grief that I described above. But here's something quite similar that has actually happened to me.

Being a scientific programmer, I find myself writing classes that calculate sequences of numbers, store them internally, and expose them to clients. Hence the number_cruncher example above. In this case, my number cruncher class had to calculate several sequences of numbers, and there were certain fairly simple relationships between these sequences. For example, sequence D was a constant multiple of sequence A, sequence E was the pointwise difference between sequences B and C, sequence F was the quotient of sequences D and E, and so on and so forth. It would have been a silly time-space trade-off for me to store each one of these sequences in a collection. Instead, I stored only sequences A, B, and C. Sequences D, E, and F were exposed via suitable combinations of boost::transform_iterator and boost::zip_iterator. These iterators calculate the respective value on the fly during dereferencing. The interface of the number_cruncher class ended up looking like this:

class number_cruncher
{
public:
  typedef std::vector<double>::const_iterator a_iterator;
  typedef a_iterator b_iterator;
  typedef a_iterator c_iterator;
  typedef boost::transform_iterator<
    boost::function<double (double)>,
    std::vector::const_iterator
  > d_iterator;
  [...]

  boost::range<a_iterator> get_a_sequence();
  boost::range<b_iterator> get_b_sequence();
  boost::range<c_iterator> get_c_sequence();
  boost::range<d_iterator> get_d_sequence();
  [...]
};

This design obviously suffers from compile-time dependencies just as the simpler examples that we considered earlier. Whenever I see fit to rearrange my sequences internally, e.g., by storing sequences C and E and have my iterators calculate B as the sum of C and E, then my clients have to recompile. But that wasn't really an issue. Still, I was faced with a full-scale revolt on the part of my clients, who accused me of delivering an abomination of an interface.

I should first point out that, as it always happens in real-life software, the number of sequences involved started out rather small but soon reached several dozen. Also, each of these sequences had a domain-related meaning. Therefore, there already was an enum to identify each sequence. Now let us analyze what kind of an interface my clients can rightfully expect of me from an object-oriented point of view. Please bear in mind that I am not talking about performance issues yet. I just want to analyze the situation from an OO perspective.

Remember, in the world of OO programming, an interface exposes what an object does for its clients, not what the object is, and not how it does what it does. What is my number cruncher supposed to do for its clients? They have an enum value, and they need the corresponding sequence of numbers in the form of a pair of input iterators that dereference to a double. There can be little doubt as to what kind of interface we should see on an object that provides this service:

class number_cruncher
{
public:
  boost::range<number_iterator> 
  get_sequence(sequence_type type);
};

Instead, my interface is bloated and cluttered with information that has a lot to do with how my number cruncher does what it does, and nothing with what it does for its clients. Focusing on the various iterator types in my interface, we see that they expose to the client what they are, when all that matters is what they do for the client. The standard solution in OO programming is to employ runtime polymorphism to treat these different types uniformly at runtime.

We see that from on OO perspective, my clients complaints are entirely justified: failure to employ runtime polymorphism has led to a poorly designed interface and strong compile-time dependencies.

There are, of course, many more examples for this kind of tension between OO programming and generic programming. The common deeper reason behind all those examples is the different role that type plays in OO programming and in generic programming. In OO programming, the judicious choice of user-defined types is a very important, if not the most important, design tool. Take one look at any high-level book on OO analysis and design, and you'll find that it's all about the proper choice of classes and the inheritance relationships between them. In generic programming, on the other hand, the type system has a tendency to get away from the software designer. Types often end up being unrelated despite the fact that the things they represent are very much related. In other words, the multitude of types is merely a side effect of some generic programming technique such as policy driven design or some such thing. Were it not for these techniques, one would have never designed the user-defined types in that way.

Now What?

So now we know that generic programming idioms and OO design principles have a tendency to conflict. However, I don't believe that throwing out one or the other is a very practical solution. In order to successfully design and implement large software systems, we must avail ourselves of OO principles as well as generic programming techniques. What we—and by we I mean all of us, C++ gurus, authors, speakers, consultants as well as software engineers in the trenches—must understand is this:

Good engineering involves compromise at every turn. A good, working, finished product is never pure by the standards of any one idiom or methodology. The art of good engineering is not the art of discovering and applying the one right idiom over all others. The art of good engineering is to know what your options are, and then to choose your trade-offs wisely rather than letting others choose them for you.

Stepping off my soapbox, what does this embrace of compromise imply for the examples of the previous section? Well, compromise is never easy. It's a pretty messy thing most of the time. That's why people love that pure, squeaky-clean, holier-than-thou absolute truth that is being preached at every street corner (and at many software development conferences as well, for that matter).

It is true that whenever you expose a pair of STL iterators such as std::vector<double>::const_iterator through a class interface, you are introducing a compile-time dependency. Moreover, as we saw in the previous section, you are getting precariously close to violating basic principles of OO programming such as encapsulation and the proper design of interfaces. But you also have an advantage on your side: when the compiler is through with its inlining, your vector iterators are quite likely to perform as efficiently as plain pointers. This performance gain may well justify the sacrifice of OO purity. The important thing to understand is that there is a trade-off to be made here. It is not uncommon in scientific programming that the performance gain afforded by generic programming techniques is indispensable. But it is equally common in large software systems that there are other performance bottlenecks that render this performance gain insignificant.

It seems to me that as far as the trade-off between OO purity and efficiency is concerned, the pendulum in the C++ community has swung to the efficiency extreme. In the 1990's, it was all about object-oriented analysis and design. Any mention of performance issues in connection with runtime polymorphism was dismissed as "the old C way of thinking." Today, it's the other way round: any proposal to replace or enhance a generic programming solution with something that involves an extra virtual function call will inevitably be met with the objection that it is "too slow." This kind of knee-jerk reaction is not useful in real-life software engineering. I always thought that the old adage concerning optimization and its relation to evil was enough guidance in this respect. But perhaps a more poignant one is needed:

Optimization whose effect no user ever notices is the root of many evils, among them the failure of software projects and the subsequent failure of businesses.

So assume that you are in a similar situation as I was with my bad number cruncher interface. Your clients demand a more OO-compliant design, and they are perfectly willing to pay a performance penalty on the order of magnitude of a virtual function call for each iterator operation. This is the time to consider type erasure as a solution. So what is type erasure, and how can it help us out here?

Type Erasure as the Glue between OO and Generic Programming

In their book on C++ template metaprogramming[4], Dave Abrahams and Aleksey Gurtovoy define type erasure as "the process of turning a wide variety of types with a common interface into one type with that same interface."

The most widely known and used examples of type erasure are boost::any[5] and boost::function[6]. I'll discuss boost::any in detail in the second part of this article. boost::function is a class template that takes one template argument, a function type. Choosing a function type amounts to choosing a return type and a list of argument types for a function. Suppose we instantiate boost::function as follows:

boost::function<int (int)> foo;

The variable foo can now hold anything that's callable with an int as its only argument, and whose return type is convertible to int. This could be a function pointer, a user-defined functor, the result of a boost::bind, or what have you. Clearly, this matches the above definition of type erasure.

The relevance of this in the context of object-oriented programming is that an interface can now say to the client programmer: "I need you to give me something that's callable as specified by this here function type. What it really is, I don't care. You can give me different things at run time. Also, you can change your client code so it gives me something other than it did before, and you won't have to recompile me." Or, referring to a return value rather than an argument, an interface could say: "I'll give you something that's callable as specified by this here function type. What it really is, you won't know. It could change at run time. I might also change it at compile time, but don't worry, you won't have to recompile because of that."

It should be clear now that our problem with iterator types could be solved by a type-erasing iterator class, that is, an iterator that can hold concrete iterators of all manner of type, as long as they have a suitable commonality. An interface such as the one of our number_cruncher could then say to its clients: "I will give you a pair of input iterators which dereference to something that converts to double. If you insist on knowing, I can tell you that what you're getting is the result of a type erasure. But that's irrelevant to you, because you will never know what the real type was before the erasure. In fact, before the erasure, the iterator category may have been better than 'input.' But you won't know that, and you will only ever see the input iterator interface. Any change that I make to the actual, un-erased type of the iterator, be it at runtime or at compile time, will go completely unnoticed by you."

In the second part of this article, I will discuss how type erasure can be implemented in C++. Specifically, I will present my any_iterator[2], a class template that provides type erasure for C++ iterators.

Page 1 of 2  >>

The C++ Source | C++ Community News | Discuss | Print | Email | First Page | Previous | Next

Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us