|
|
|
Sponsored Link •
|
|
Advertisement
|
any_iterator, a type-safe, heterogeneous C++
iterator.
In his glossary of terms[1], Bjarne Stroustrup has described the C++ programming language that he created as "a general-purpose programming language [...] that supports procedural programming, data abstraction, object-oriented programming, and generic programming." The fact that C++ supports these different programming paradigms makes it unique—and uniquely powerful—among today's programming languages. On the other hand, it should not come as a surprise that the close coexistence of such vastly different paradigms can cause considerable friction, especially in large software systems.
In this article, I will focus on the tension that can occur when object-oriented programming (classes, objects, and runtime polymorphism come to mind) meets generic programming (algorithms, templates, and compile time polymorphism come to mind).
The article consists of two parts. In the first part, I will demonstrate how the coexistence of OO and generic programming can cause serious friction in real-life software engineering. I will then explain how a technique known as type erasure can be used to alleviate these problems.
The second part explains how type erasure can be implemented in C++.
Specifically, I will elaborate on an example used in the first part,
namely, C++ iterator type erasure. I will discuss the design and
implementation of a class template[2]
any_iterator that provides type erasure for C++
iterators.
Let us start with a little trivia quiz. Who said the following things about object-oriented programming?
"I find OOP technically unsound."
"I find OOP philosophically unsound."
"I find OOP methodologically wrong."
"I have yet to see an interesting piece of code that comes from these OO people."
"I think that object orientedness is almost as much of a hoax as artificial intelligence."
All the quotes above are from an interview with Alexander Stepanov[3], the inventor of the STL and elder statesman of generic programming. As a practicing software engineer who works on large commercial software projects, I know better than to hold such a negative view of OO programming. But when someone like Alexander Stepanov says such a thing, then I don't think it should be taken lightly.
My experience as a software engineer in the trenches has taught me that there is much more tension, if not contradiction or incompatibility, between OO programming and generic programming than many people care to admit. It is easy to dismiss Alexander Stepanov's rejection of OO programming as extreme and unrealistic. It is much harder to make the OO and generic programming paradigms coexist and cooperate in real-life software engineering.
In the next three sections, I will illustrate the problem with an example from the real world, and I will suggest a less radical remedy than to disavow OO programming as a tool in software design altogether.
Suppose you have a class number_cruncher that calculates
and stores a sequence of numbers. Further, suppose
that number_cruncher wants to expose to its clients a way
to iterate over the stored numbers and retrieve them.
In the old days, predating the STL, the interface of
number_cruncher might have exposed a callback for
retrieving the numbers, or perhaps some form
of get_first()/get_next() idiom. In an
evironment where the STL and generic programming are used, such an
interface would raise more than a few eyebrows. Here, the interface
of number_cruncher will expose a pair of iterators,
like this:
class number_cruncher
{
public:
typedef std::vector<double>::const_iterator const_iterator;
const_iterator begin() const;
const_iterator end() const;
[...]
};
Now suppose you change the type of the collection which internally
holds the items from std::vector to some other kind of
vector. (Yes, there are other kinds of vectors.) In the sense of OO
programming, what have you just done? You have changed an
implementation detail. And what happens? The world recompiles. By
using a typedef for the iterator that you expose, you were able to
ensure that your clients did not have to change their code.
Nonetheless, their code depends on the type of the iterator, and thus
on the type of the collection that you use to store your numbers. In
the pre-STL design mentioned earlier, this dependency can easily be
eliminated, e.g., by using the pimpl idiom, or by exposing the number
cruncher's interface via an abstract base class. This option is now
gone. By using the typedef for the iterators, you have introduced
a compile-time dependency that cannot be eliminated. Depending on the
context in which your class is used, that may or may not be a serious
issue.
But it gets worse. Suppose that the implementation
of number_cruncher changes in such a way that there are
frequent insertions to the internal collection of numbers. Having
anticipated a change of that kind, you have carefully
specified number_cruncher's interface to state that it
will expose a pair of input iterators, dereferencing to
something that converts to a double, with incrementing and
dereferencing both having O(1) complexity. Therefore, you are within
your rights when you change the internal type of your collection
from std::vector<double> to
std::list<double>. Again, your clients will have to
recompile. But it is also possible that before, when you were exposing
vector iterators, some of your clients may have said, gee, I wonder how
many elements there are in this number cruncher. Let me do:
size_t numItems = my_number_cruncher.end() - my_number_cruncher.begin();
You did not mean for them to do that, but you could not prevent it
either. Now, after you changed an implementation detail
of number_cruncher, not only will the world recompile,
the world will find that it has a broken build. By exposing
functionality that you neither needed nor wanted to expose,
you have broken encapsulation.
Well, that's not really an issue, you might say, because if your
clients are STL savvy, as they should be, then they were probably
using std::distance to get the number of elements in
the number_cruncher:
size_t numItems = std::distance( my_number_cruncher.begin(), my_number_cruncher.end() );
If they were doing that, then your implementation change from vector to list causes the semantics of your number cruncher to change: the cost of getting the number of elements changes from constant to linear. This is probably even worse than the broken build, because it may not be noticed until way, way later, when something goes horribly awry in the field. You never meant for anybody to get the number of elements in constant time, but you could not prevent it.
Perhaps you think that none of these problems so far are very serious
or very likely to happen in practice. Well, ok, I will admit that I
made all this up. I have never changed an std::vector to
a different kind of vector, and changing an
std::vector to an std::list is extremely
rare and has never caused me the kind of grief that I described
above. But here's something quite similar that has actually happened
to me.
Being a scientific programmer, I find myself writing classes that
calculate sequences of numbers, store them internally, and expose them
to clients. Hence the number_cruncher example above. In
this case, my number cruncher class had to calculate several sequences
of numbers, and there were certain fairly simple relationships between
these sequences. For example, sequence D was a constant multiple of
sequence A, sequence E was the pointwise difference between sequences
B and C, sequence F was the quotient of sequences D and E, and so on
and so forth. It would have been a silly time-space trade-off for me
to store each one of these sequences in a collection. Instead, I
stored only sequences A, B, and C. Sequences D, E, and F were exposed
via suitable combinations of boost::transform_iterator
and boost::zip_iterator. These iterators calculate the
respective value on the fly during dereferencing. The interface of
the number_cruncher class ended up looking like this:
class number_cruncher
{
public:
typedef std::vector<double>::const_iterator a_iterator;
typedef a_iterator b_iterator;
typedef a_iterator c_iterator;
typedef boost::transform_iterator<
boost::function<double (double)>,
std::vector::const_iterator
> d_iterator;
[...]
boost::range<a_iterator> get_a_sequence();
boost::range<b_iterator> get_b_sequence();
boost::range<c_iterator> get_c_sequence();
boost::range<d_iterator> get_d_sequence();
[...]
};
This design obviously suffers from compile-time dependencies just as the simpler examples that we considered earlier. Whenever I see fit to rearrange my sequences internally, e.g., by storing sequences C and E and have my iterators calculate B as the sum of C and E, then my clients have to recompile. But that wasn't really an issue. Still, I was faced with a full-scale revolt on the part of my clients, who accused me of delivering an abomination of an interface.
I should first point out that, as it always happens in real-life software, the number of sequences involved started out rather small but soon reached several dozen. Also, each of these sequences had a domain-related meaning. Therefore, there already was an enum to identify each sequence. Now let us analyze what kind of an interface my clients can rightfully expect of me from an object-oriented point of view. Please bear in mind that I am not talking about performance issues yet. I just want to analyze the situation from an OO perspective.
Remember, in the world of OO programming, an interface exposes what an object does for its clients, not what the object is, and not how it does what it does. What is my number cruncher supposed to do for its clients? They have an enum value, and they need the corresponding sequence of numbers in the form of a pair of input iterators that dereference to a double. There can be little doubt as to what kind of interface we should see on an object that provides this service:
class number_cruncher
{
public:
boost::range<number_iterator>
get_sequence(sequence_type type);
};
Instead, my interface is bloated and cluttered with information that has a lot to do with how my number cruncher does what it does, and nothing with what it does for its clients. Focusing on the various iterator types in my interface, we see that they expose to the client what they are, when all that matters is what they do for the client. The standard solution in OO programming is to employ runtime polymorphism to treat these different types uniformly at runtime.
We see that from on OO perspective, my clients complaints are entirely justified: failure to employ runtime polymorphism has led to a poorly designed interface and strong compile-time dependencies.
There are, of course, many more examples for this kind of tension between OO programming and generic programming. The common deeper reason behind all those examples is the different role that type plays in OO programming and in generic programming. In OO programming, the judicious choice of user-defined types is a very important, if not the most important, design tool. Take one look at any high-level book on OO analysis and design, and you'll find that it's all about the proper choice of classes and the inheritance relationships between them. In generic programming, on the other hand, the type system has a tendency to get away from the software designer. Types often end up being unrelated despite the fact that the things they represent are very much related. In other words, the multitude of types is merely a side effect of some generic programming technique such as policy driven design or some such thing. Were it not for these techniques, one would have never designed the user-defined types in that way.
So now we know that generic programming idioms and OO design principles have a tendency to conflict. However, I don't believe that throwing out one or the other is a very practical solution. In order to successfully design and implement large software systems, we must avail ourselves of OO principles as well as generic programming techniques. What we—and by we I mean all of us, C++ gurus, authors, speakers, consultants as well as software engineers in the trenches—must understand is this:
| Good engineering involves compromise at every turn. A good, working, finished product is never pure by the standards of any one idiom or methodology. The art of good engineering is not the art of discovering and applying the one right idiom over all others. The art of good engineering is to know what your options are, and then to choose your trade-offs wisely rather than letting others choose them for you. |
Stepping off my soapbox, what does this embrace of compromise imply for the examples of the previous section? Well, compromise is never easy. It's a pretty messy thing most of the time. That's why people love that pure, squeaky-clean, holier-than-thou absolute truth that is being preached at every street corner (and at many software development conferences as well, for that matter).
It is true that whenever you expose a pair of STL iterators such
as std::vector<double>::const_iterator through a
class interface, you are introducing a compile-time
dependency. Moreover, as we saw in the previous section, you are
getting precariously close to violating basic principles of OO
programming such as encapsulation and the proper design of interfaces.
But you also have an advantage on your side: when the compiler is
through with its inlining, your vector iterators are quite likely to
perform as efficiently as plain pointers. This performance gain may
well justify the sacrifice of OO purity. The important thing to
understand is that there is a trade-off to be made here. It is not
uncommon in scientific programming that the performance gain afforded
by generic programming techniques is indispensable. But it is equally
common in large software systems that there are other performance
bottlenecks that render this performance gain insignificant.
It seems to me that as far as the trade-off between OO purity and efficiency is concerned, the pendulum in the C++ community has swung to the efficiency extreme. In the 1990's, it was all about object-oriented analysis and design. Any mention of performance issues in connection with runtime polymorphism was dismissed as "the old C way of thinking." Today, it's the other way round: any proposal to replace or enhance a generic programming solution with something that involves an extra virtual function call will inevitably be met with the objection that it is "too slow." This kind of knee-jerk reaction is not useful in real-life software engineering. I always thought that the old adage concerning optimization and its relation to evil was enough guidance in this respect. But perhaps a more poignant one is needed:
| Optimization whose effect no user ever notices is the root of many evils, among them the failure of software projects and the subsequent failure of businesses. |
So assume that you are in a similar situation as I was with my bad number cruncher interface. Your clients demand a more OO-compliant design, and they are perfectly willing to pay a performance penalty on the order of magnitude of a virtual function call for each iterator operation. This is the time to consider type erasure as a solution. So what is type erasure, and how can it help us out here?
In their book on C++ template metaprogramming[4], Dave Abrahams and Aleksey Gurtovoy define type erasure as "the process of turning a wide variety of types with a common interface into one type with that same interface."
The most widely known and used examples
of type erasure are boost::any[5]
and boost::function[6].
I'll discuss boost::any in detail in the second part of
this article. boost::function is a class template that
takes one template argument, a function type. Choosing a function type
amounts to choosing a return type and a list of argument types for a
function. Suppose we instantiate boost::function as
follows:
boost::function<int (int)> foo;
The variable foo can now hold anything that's callable with an
int as its only argument, and whose return type is
convertible to int. This could be a function pointer, a
user-defined functor, the result of a boost::bind, or
what have you. Clearly, this matches the above definition of type
erasure.
The relevance of this in the context of object-oriented programming is that an interface can now say to the client programmer: "I need you to give me something that's callable as specified by this here function type. What it really is, I don't care. You can give me different things at run time. Also, you can change your client code so it gives me something other than it did before, and you won't have to recompile me." Or, referring to a return value rather than an argument, an interface could say: "I'll give you something that's callable as specified by this here function type. What it really is, you won't know. It could change at run time. I might also change it at compile time, but don't worry, you won't have to recompile because of that."
It should be clear now that our problem with iterator types could be
solved by a type-erasing iterator class, that is, an iterator that can
hold concrete iterators of all manner of type, as long as they have a
suitable commonality. An interface such as the one of
our number_cruncher could then say to its clients: "I
will give you a pair of input iterators which dereference to something
that converts to double. If you insist on knowing, I can
tell you that what you're getting is the result of a type erasure.
But that's irrelevant to you, because you will never know what the
real type was before the erasure. In fact, before the erasure, the
iterator category may have been better than 'input.' But you won't
know that, and you will only ever see the input iterator
interface. Any change that I make to the actual, un-erased type of the
iterator, be it at runtime or at compile time, will go completely
unnoticed by you."
In the second part of this article, I will discuss how type erasure
can be implemented in C++. Specifically, I will present my any_iterator[2], a class template that provides type
erasure for C++ iterators.
|
Sponsored Links
|