|
|
|
The C++ Source |
C++ Community News |
Discuss |
Print |
Email |
Screen Friendly Version |
Previous |
Next
|
|
Sponsored Link •
|
any_iterator, a type-safe, heterogeneous C++
iterator.
In his glossary of terms[1], Bjarne Stroustrup has described the C++ programming language that he created as "a general-purpose programming language [...] that supports procedural programming, data abstraction, object-oriented programming, and generic programming." The fact that C++ supports these different programming paradigms makes it unique—and uniquely powerful—among today's programming languages. On the other hand, it should not come as a surprise that the close coexistence of such vastly different paradigms can cause considerable friction, especially in large software systems.
In this article, I will focus on the tension that can occur when object-oriented programming (classes, objects, and runtime polymorphism come to mind) meets generic programming (algorithms, templates, and compile time polymorphism come to mind).
The article consists of two parts. In the first part, I will demonstrate how the coexistence of OO and generic programming can cause serious friction in real-life software engineering. I will then explain how a technique known as type erasure can be used to alleviate these problems.
The second part explains how type erasure can be implemented in C++.
Specifically, I will elaborate on an example used in the first part,
namely, C++ iterator type erasure. I will discuss the design and
implementation of a class template[2]
any_iterator that provides type erasure for C++
iterators.
Let us start with a little trivia quiz. Who said the following things about object-oriented programming?
"I find OOP technically unsound."
"I find OOP philosophically unsound."
"I find OOP methodologically wrong."
"I have yet to see an interesting piece of code that comes from these OO people."
"I think that object orientedness is almost as much of a hoax as artificial intelligence."
All the quotes above are from an interview with Alexander Stepanov[3], the inventor of the STL and elder statesman of generic programming. As a practicing software engineer who works on large commercial software projects, I know better than to hold such a negative view of OO programming. But when someone like Alexander Stepanov says such a thing, then I don't think it should be taken lightly.
My experience as a software engineer in the trenches has taught me that there is much more tension, if not contradiction or incompatibility, between OO programming and generic programming than many people care to admit. It is easy to dismiss Alexander Stepanov's rejection of OO programming as extreme and unrealistic. It is much harder to make the OO and generic programming paradigms coexist and cooperate in real-life software engineering.
In the next three sections, I will illustrate the problem with an example from the real world, and I will suggest a less radical remedy than to disavow OO programming as a tool in software design altogether.
Suppose you have a class number_cruncher that calculates
and stores a sequence of numbers. Further, suppose
that number_cruncher wants to expose to its clients a way
to iterate over the stored numbers and retrieve them.
In the old days, predating the STL, the interface of
number_cruncher might have exposed a callback for
retrieving the numbers, or perhaps some form
of get_first()/get_next() idiom. In an
evironment where the STL and generic programming are used, such an
interface would raise more than a few eyebrows. Here, the interface
of number_cruncher will expose a pair of iterators,
like this:
class number_cruncher
{
public:
typedef std::vector<double>::const_iterator const_iterator;
const_iterator begin() const;
const_iterator end() const;
[...]
};
Now suppose you change the type of the collection which internally
holds the items from std::vector to some other kind of
vector. (Yes, there are other kinds of vectors.) In the sense of OO
programming, what have you just done? You have changed an
implementation detail. And what happens? The world recompiles. By
using a typedef for the iterator that you expose, you were able to
ensure that your clients did not have to change their code.
Nonetheless, their code depends on the type of the iterator, and thus
on the type of the collection that you use to store your numbers. In
the pre-STL design mentioned earlier, this dependency can easily be
eliminated, e.g., by using the pimpl idiom, or by exposing the number
cruncher's interface via an abstract base class. This option is now
gone. By using the typedef for the iterators, you have introduced
a compile-time dependency that cannot be eliminated. Depending on the
context in which your class is used, that may or may not be a serious
issue.
But it gets worse. Suppose that the implementation
of number_cruncher changes in such a way that there are
frequent insertions to the internal collection of numbers. Having
anticipated a change of that kind, you have carefully
specified number_cruncher's interface to state that it
will expose a pair of input iterators, dereferencing to
something that converts to a double, with incrementing and
dereferencing both having O(1) complexity. Therefore, you are within
your rights when you change the internal type of your collection
from std::vector<double> to
std::list<double>. Again, your clients will have to
recompile. But it is also possible that before, when you were exposing
vector iterators, some of your clients may have said, gee, I wonder how
many elements there are in this number cruncher. Let me do:
size_t numItems = my_number_cruncher.end() - my_number_cruncher.begin();
You did not mean for them to do that, but you could not prevent it
either. Now, after you changed an implementation detail
of number_cruncher, not only will the world recompile,
the world will find that it has a broken build. By exposing
functionality that you neither needed nor wanted to expose,
you have broken encapsulation.
Well, that's not really an issue, you might say, because if your
clients are STL savvy, as they should be, then they were probably
using std::distance to get the number of elements in
the number_cruncher:
size_t numItems = std::distance( my_number_cruncher.begin(), my_number_cruncher.end() );
If they were doing that, then your implementation change from vector to list causes the semantics of your number cruncher to change: the cost of getting the number of elements changes from constant to linear. This is probably even worse than the broken build, because it may not be noticed until way, way later, when something goes horribly awry in the field. You never meant for anybody to get the number of elements in constant time, but you could not prevent it.
Perhaps you think that none of these problems so far are very serious
or very likely to happen in practice. Well, ok, I will admit that I
made all this up. I have never changed an std::vector to
a different kind of vector, and changing an
std::vector to an std::list is extremely
rare and has never caused me the kind of grief that I described
above. But here's something quite similar that has actually happened
to me.
Being a scientific programmer, I find myself writing classes that
calculate sequences of numbers, store them internally, and expose them
to clients. Hence the number_cruncher example above. In
this case, my number cruncher class had to calculate several sequences
of numbers, and there were certain fairly simple relationships between
these sequences. For example, sequence D was a constant multiple of
sequence A, sequence E was the pointwise difference between sequences
B and C, sequence F was the quotient of sequences D and E, and so on
and so forth. It would have been a silly time-space trade-off for me
to store each one of these sequences in a collection. Instead, I
stored only sequences A, B, and C. Sequences D, E, and F were exposed
via suitable combinations of boost::transform_iterator
and boost::zip_iterator. These iterators calculate the
respective value on the fly during dereferencing. The interface of
the number_cruncher class ended up looking like this:
class number_cruncher
{
public:
typedef std::vector<double>::const_iterator a_iterator;
typedef a_iterator b_iterator;
typedef a_iterator c_iterator;
typedef boost::transform_iterator<
boost::function<double (double)>,
std::vector::const_iterator
> d_iterator;
[...]
boost::range<a_iterator> get_a_sequence();
boost::range<b_iterator> get_b_sequence();
boost::range<c_iterator> get_c_sequence();
boost::range<d_iterator> get_d_sequence();
[...]
};
This design obviously suffers from compile-time dependencies just as the simpler examples that we considered earlier. Whenever I see fit to rearrange my sequences internally, e.g., by storing sequences C and E and have my iterators calculate B as the sum of C and E, then my clients have to recompile. But that wasn't really an issue. Still, I was faced with a full-scale revolt on the part of my clients, who accused me of delivering an abomination of an interface.
I should first point out that, as it always happens in real-life software, the number of sequences involved started out rather small but soon reached several dozen. Also, each of these sequences had a domain-related meaning. Therefore, there already was an enum to identify each sequence. Now let us analyze what kind of an interface my clients can rightfully expect of me from an object-oriented point of view. Please bear in mind that I am not talking about performance issues yet. I just want to analyze the situation from an OO perspective.
Remember, in the world of OO programming, an interface exposes what an object does for its clients, not what the object is, and not how it does what it does. What is my number cruncher supposed to do for its clients? They have an enum value, and they need the corresponding sequence of numbers in the form of a pair of input iterators that dereference to a double. There can be little doubt as to what kind of interface we should see on an object that provides this service:
class number_cruncher
{
public:
boost::range<number_iterator>
get_sequence(sequence_type type);
};
Instead, my interface is bloated and cluttered with information that has a lot to do with how my number cruncher does what it does, and nothing with what it does for its clients. Focusing on the various iterator types in my interface, we see that they expose to the client what they are, when all that matters is what they do for the client. The standard solution in OO programming is to employ runtime polymorphism to treat these different types uniformly at runtime.
We see that from on OO perspective, my clients complaints are entirely justified: failure to employ runtime polymorphism has led to a poorly designed interface and strong compile-time dependencies.
There are, of course, many more examples for this kind of tension between OO programming and generic programming. The common deeper reason behind all those examples is the different role that type plays in OO programming and in generic programming. In OO programming, the judicious choice of user-defined types is a very important, if not the most important, design tool. Take one look at any high-level book on OO analysis and design, and you'll find that it's all about the proper choice of classes and the inheritance relationships between them. In generic programming, on the other hand, the type system has a tendency to get away from the software designer. Types often end up being unrelated despite the fact that the things they represent are very much related. In other words, the multitude of types is merely a side effect of some generic programming technique such as policy driven design or some such thing. Were it not for these techniques, one would have never designed the user-defined types in that way.
So now we know that generic programming idioms and OO design principles have a tendency to conflict. However, I don't believe that throwing out one or the other is a very practical solution. In order to successfully design and implement large software systems, we must avail ourselves of OO principles as well as generic programming techniques. What we—and by we I mean all of us, C++ gurus, authors, speakers, consultants as well as software engineers in the trenches—must understand is this:
| Good engineering involves compromise at every turn. A good, working, finished product is never pure by the standards of any one idiom or methodology. The art of good engineering is not the art of discovering and applying the one right idiom over all others. The art of good engineering is to know what your options are, and then to choose your trade-offs wisely rather than letting others choose them for you. |
Stepping off my soapbox, what does this embrace of compromise imply for the examples of the previous section? Well, compromise is never easy. It's a pretty messy thing most of the time. That's why people love that pure, squeaky-clean, holier-than-thou absolute truth that is being preached at every street corner (and at many software development conferences as well, for that matter).
It is true that whenever you expose a pair of STL iterators such
as std::vector<double>::const_iterator through a
class interface, you are introducing a compile-time
dependency. Moreover, as we saw in the previous section, you are
getting precariously close to violating basic principles of OO
programming such as encapsulation and the proper design of interfaces.
But you also have an advantage on your side: when the compiler is
through with its inlining, your vector iterators are quite likely to
perform as efficiently as plain pointers. This performance gain may
well justify the sacrifice of OO purity. The important thing to
understand is that there is a trade-off to be made here. It is not
uncommon in scientific programming that the performance gain afforded
by generic programming techniques is indispensable. But it is equally
common in large software systems that there are other performance
bottlenecks that render this performance gain insignificant.
It seems to me that as far as the trade-off between OO purity and efficiency is concerned, the pendulum in the C++ community has swung to the efficiency extreme. In the 1990's, it was all about object-oriented analysis and design. Any mention of performance issues in connection with runtime polymorphism was dismissed as "the old C way of thinking." Today, it's the other way round: any proposal to replace or enhance a generic programming solution with something that involves an extra virtual function call will inevitably be met with the objection that it is "too slow." This kind of knee-jerk reaction is not useful in real-life software engineering. I always thought that the old adage concerning optimization and its relation to evil was enough guidance in this respect. But perhaps a more poignant one is needed:
| Optimization whose effect no user ever notices is the root of many evils, among them the failure of software projects and the subsequent failure of businesses. |
So assume that you are in a similar situation as I was with my bad number cruncher interface. Your clients demand a more OO-compliant design, and they are perfectly willing to pay a performance penalty on the order of magnitude of a virtual function call for each iterator operation. This is the time to consider type erasure as a solution. So what is type erasure, and how can it help us out here?
In their book on C++ template metaprogramming[4], Dave Abrahams and Aleksey Gurtovoy define type erasure as "the process of turning a wide variety of types with a common interface into one type with that same interface."
The most widely known and used examples
of type erasure are boost::any[5]
and boost::function[6].
I'll discuss boost::any in detail in the second part of
this article. boost::function is a class template that
takes one template argument, a function type. Choosing a function type
amounts to choosing a return type and a list of argument types for a
function. Suppose we instantiate boost::function as
follows:
boost::function<int (int)> foo;
The variable foo can now hold anything that's callable with an
int as its only argument, and whose return type is
convertible to int. This could be a function pointer, a
user-defined functor, the result of a boost::bind, or
what have you. Clearly, this matches the above definition of type
erasure.
The relevance of this in the context of object-oriented programming is that an interface can now say to the client programmer: "I need you to give me something that's callable as specified by this here function type. What it really is, I don't care. You can give me different things at run time. Also, you can change your client code so it gives me something other than it did before, and you won't have to recompile me." Or, referring to a return value rather than an argument, an interface could say: "I'll give you something that's callable as specified by this here function type. What it really is, you won't know. It could change at run time. I might also change it at compile time, but don't worry, you won't have to recompile because of that."
It should be clear now that our problem with iterator types could be
solved by a type-erasing iterator class, that is, an iterator that can
hold concrete iterators of all manner of type, as long as they have a
suitable commonality. An interface such as the one of
our number_cruncher could then say to its clients: "I
will give you a pair of input iterators which dereference to something
that converts to double. If you insist on knowing, I can
tell you that what you're getting is the result of a type erasure.
But that's irrelevant to you, because you will never know what the
real type was before the erasure. In fact, before the erasure, the
iterator category may have been better than 'input.' But you won't
know that, and you will only ever see the input iterator
interface. Any change that I make to the actual, un-erased type of the
iterator, be it at runtime or at compile time, will go completely
unnoticed by you."
In the second part of this article, I will discuss how type erasure
can be implemented in C++. Specifically, I will present my any_iterator[2], a class template that provides type
erasure for C++ iterators.
boost::any
Recall that Dave Abrahams and Aleksey Gurtovoy defined type erasure as "the process of turning a wide variety of types with a common interface into one type with that same interface." Now imagine you are designing a set of C++ classes with a common interface from scratch, and you want to be able to treat them as one type with that same interface. Talk about OO programming 101: you would derive your classes from a common base class. And that's exactly the idea behind implementing type erasure in C++. We would like the classes whose type we want to erase to be derived from a common base class. But since these classes already exist, and we must work non-intrusively, we can do this only indirectly, by first placing a wrapper class template around the existing classes. The wrapper class template is under our control, and therefore, we can derive it from a common base class.
A good first example to illustrate all this
is boost::any, which provides type erasure for the set of
all C++ types that are copiable and destructible. A variable of
type boost::any can hold objects of any such type. This
example is extreme, but it is also extremely simple, because the
common interface of the erased types is empty.
Here is the implementation of boost::any, abbreviated to
its bare essentials. The common abstract base class from which the
wrapper class template will be derived looks like this:
class placeholder
{
public:
virtual ~placeholder() {}
virtual placeholder* clone() const=0;
};
And this is the wrapper class template:
template<typename ValueType>
class holder : public placeholder
{
public:
holder(ValueType const & value) : held(value) {}
virtual placeholder* clone() const
{return new holder(held);}
private:
ValueType held;
};
The actual type erasing class any is a handle class that
holds a pointer to the abstract base class:
class any
{
public:
any() : content(0) {}
template
any(ValueType const & value) : content(new holder(value)) {}
any(any const & other) :
content(other.content ? other.content->clone() : 0) {}
~any()
{delete content;}
// Implement swap as swapping placeholder pointers, assignment
// as copy and swap.
private:
placeholder* content;
};
A typical use of boost::any is to hold objects of vastly
different types in an STL container.
Since the common interface of the types that boost::any
erases is empty, an object of type boost::any isn't worth
very much until one retrieves the held object with its original
type. To this end, the real boost::any provides a query
function to obtain the typeid of the held object, and safe casts to
retrieve the held object. In this one
respect, boost::any is not typical of type erasure in
general. In those cases where the type erasing class actually has an
interface, part of the purpose of the type erasure is to restrict
clients to that interface. One should therefore not provide ways to
retrieve the original objects with their original types.
It should be clear by now how a type erasing class will handle the implementation of the common interface of the erased types: it will forward its public member functions to the abstract base class, where they exist as pure virtuals. The wrapper class template will implement these pure virtuals by forwarding them to its held object.
As was to be expected, type erasure does not come free of charge. We just
saw that calling a member function on a type-erasing object incurs the extra
cost of a level of indirection and a virtual function call. Construction and
destruction of such an object involves newing and deleting an object. The heap
access can sometimes be eliminated if a small object optimization is employed,
as is the case with the poly library[7] (more about this very interesting
library shortly). If you agree with me that good engineering is not about
following dogmas but about choosing sensible trade-offs, then you will also
agree that this performance overhead is not per se a problem. It is an aspect
to be kept in mind when choosing your trade-offs.
boost::any
In his original documentation of boost::any[5], Kevlin Henney has
pointed out that his any class can be used as a blueprint
for more general instances of type erasure: "A similar design,
offering more appropriate operators, can be used for a generalized
function adaptor, any_function, a generalized iterator adaptor,
any_iterator, and other object types that need uniform runtime
treatment but support only compile-time template parameter
conformance."
The any_function class has since been implemented
as boost::function. In the next few sections, I will
discuss my attempt at the any_iterator[2]. A similar,
independent effort at C++ iterator type erasure is
the Adobe Public Library's any_iterator[8]. I will discuss the
relationship between their work and mine later.
Perhaps the most exciting recent work in the area of type erasure is
the poly library by Jaakko Jarvi, Mat Marcus, and Sean
Parent. What they do, in the best tradition of generic
programming, is to abstract out the commonality of C++ type erasure
implementations into a generic mechanism. Their poly
library is done in ConceptGCC, the fork of gcc that implements
concepts as envisioned by the C++ Standard Committee. Therefore, it
may be a while until their generic type erasure will be available to
the C++ community at large. Nonetheless, I believe that
the poly library will be of critical importance in
applied C++ programming, as it will provide the long sought-after
reconciliation between the object-oriented and generic programming
paradigms.
any_iterator
Let me begin by reminding you what the original motivation for
my any_iterator was: I had a number crunching class that
provided to its clients a large number of sequences of numbers. Some
of these sequences—the primary ones—were internally stored
in STL containers, others—the secondary ones—were
calculated from the primary ones on the fly by Boost iterator
adaptors. Making clients deal with this zoo of iterator types had
caused grief. Therefore, I wanted an iterator that was to be an
input iterator—and an input iterator
only—that would dereference to a double. This
iterator was to be such that it could hold any one of the multitude of
iterators that my class was exposing.
It would of course be silly to go for just this special input
iterator. Instead, I aimed for a full solution to the C++ iterator
type erasure problem. The first issue that one needs to address in a
situation like that is the granularity of the type erasure. Clearly,
it does not make sense to have a single any_iterator type
that can hold just any old C++ iterator.
So what kind of any_iterator types should there be, and
what "concrete" iterators should be assignable to them? In order to
answer these questions it is useful to do a fast forward and think
about the implementation of the any_iterator first. It
turns out that we have our work cut out for us here. On the one hand,
we know that the overall structure of our implementation will follow
the standard type erasure technique as set forth
by boost::any. On the other hand, it is clear that an
iterator class that one writes in C++ must be Standard conforming, and
the Standard conformance will of course be achieved by deriving from
boost::iterator_facade[9]. As usual, the devil is in the
details, but this high-level description is pretty much all
you need to know about the implementation
of any_iterator.
Now we are in a position to tackle the issue of the granularity of our iterator type erasure.
The boost::iterator_facade class template that we'll be
using to implement the any_iterator takes five template
arguments, like this:
template< lass Derived, // The derived iterator type being constructed class Value, class CategoryOrTraversal, class Reference = Value&, class Difference = std::ptrdiff_t > class iterator_facade;
When we derive our any_iterator from this class template,
we'll have to decide on values for the second through fifth template
arguments. These four types make up the iterator traits in the sense
of the Boost iterator library. This suggests that we should have our
any_iterator class template take exactly those four
template arguments. In other words, there will be exactly
one any_iterator type for each set of iterator traits.
This design is in keeping with the spirit of the Boost iterator
library and it has worked well for us in practice.
The other question in connection with the granularity of the type
erasure is which "concrete" iterators should be assignable to an
instantiation of the any_iterator class
template. Technically speaking, this is the problem of correctly
enabling and disabling the templatized converting
constructor and assignment operator of the any_iterator
class template. To make the solution that I chose plausible to you,
let us look at an example of using the any_iterator.
Remember that earlier, we were looking for an input
iterator type that would dereference to a double. This iterator type
was to be capable of holding STL iterators such
as std::vector<double>::const_iterator and
std::list<double>::const_iterator. It should also
be able to hold all manner of Boost iterators such as transform
iterators and zip iterators, as long as they are input iterators and
dereference to something that converts to a double. To make the
example a bit more interesting, let us assume that we want to expose
an iterator with bidirectional traversal capabilities instead of just
an input iterator. Given that our any_iterator class
template looks like this:
template< class Value, class CategoryOrTraversal, class Reference = Value&, class Difference = std::ptrdiff_t > class any_iterator;let's try this for our number iterator:
typedef any_iterator< double const, boost::bidirectional_traversal_tag > number_iterator;
Notice that I have used the bidirectional traversal tag from the Boost
iterator library for the CategoryOrTraversal template
parameter. It is possible to use the old-style STL iterator categories
such as std::bidirectional_iterator_tag here, but you
should prefer the new traversal tags. See the
Boost iterator library documentation[10] for more details.
This will actually go a long way. The following will compile and work fine:
number_iterator number_it; std::vector<double> number_vector(42, 43.0); number_it = number_vector.begin(); double d = *number_it; std::list<double> number_list(42, 44.0); number_it = number_list.begin(); d = *number_it;
However, if we try to assign to our number iterator a transform iterator which, upon dereferencing, multiplies by 100.0, we run into trouble.
number_it = boost::make_transform_iterator( number_vector.begin(), boost::bind(std::multiplies<double>(), _1, 100) );
My any_iterator's assignment operator is not enabled for
this assignment, and you will get an error message such as:
binary '=': no operator found which takes a right-hand operand of type...
Why is it that I disallow this? Our number iterator currently has a
reference type of double const&. The transform iterator's
reference type is double. Dereferencing the number
iterator is implemented by forwarding to the wrapped transform
iterator. The conversion would work fine: double converts
to double const&. But the number
iterator's operator* would be returning a reference to a
temporary local variable. If you follow the good practice of treating
compiler warnings as errors, you will most likely catch that, but I
don't want to be responsible for that. Therefore, I don't even let you
go there.
The solution is easy: our choice of traits for the number iterator was not quite right.
What we really want is this:
typedef any_iterator< double const, // Value boost::bidirectional_traversal_tag, double const // Reference > number_iterator;
Now all of the assignments above will compile and work. Note that the old STL-style iterator categories would not have allowed us to do this: in the STL, the reference type of a bidirectional iterator must be a reference. The new Boost iterator traversal tags give us more degrees of freedom in this regard.
It is now also possible to start with a vector of floats, like this:
std::vector<float> float_vector(42, 43.0); number_it = float_vector.begin();
This would not have been allowed either before, when
the any_iterator's reference type was double
const& instead of double const. The reason is that
somewhere deep inside of operator*, there was a
conversion from float const& to double
const&, which results in the creation of a temporary.
The any_iterator's operator* would have
returned a reference to that temporary. So the bad news is that these
references to local temporaries can creep up on you in unexpected
ways. The good news is that my any_iterator will catch
these situations for you and disallow the respective assignments and
copies.
It turns out that the exact rules by which
an any_iterator instantiation decides which iterator
types to accept for wrapping are quite subtle. But before we go into
the details, let me point out that in practice, there is very little
need to study these rules. One is not very likely to make mistakes in
this respect. For example, it is clear that
an any_iterator like our number_iterator,
having bidirectional traversal category, will not accept something that's a
forward iterator only, like an iterator into a singly linked
list. That would not make sense. Similarly,
our number_iterator would not accept anything that
dereferences to something that does not convert to
a double. There is no way for that to be meaningful. So
really, using the any_iterator is just as easy as it
looks in the examples above. But for completeness' sake, and for the
fun of it, let us study the rules anyway.
Suppose that some_any_iterator is an instantiation of
the any_iterator class template with value type,
traversal tag, reference type, and difference type equal
to AnyItValue, AnyItTraversal,
AnyItReference, and AnyItDifference,
respectively. Assume further that some_iterator is an
iterator type with value type, traversal tag, reference type, and
difference type equal
to ItValue, ItTraversal,
ItReference, and ItDifference, respectively.
Then a variable of type some_any_iterator will accept an
object of type some_iterator if and only if the following
four conditions are met:
ItValue converts to AnyItValue.ItTraversal
and AnyItTraversal are equal, or the former is
derived from the latter. This means that
some_iterator's traversal category is equal to or better
than that of some_any_iterator.ItReference converts
to AnyItReference.AnyItReference is a
reference, then so is ItReference.AnyItReference and ItReference are
both references, then the following is true: after stripping const
qualifiers and references from AnyItReference
and ItReference, the two are either the same, or the
former is a base class of the latter.some_any_iterator's
operator* would return a reference to a
temporary.some_any_iterator's traversal category is random
access, then
ItDifference and AnyItDifference are
convertible to each other both ways. Here, we need convertibility in
both directions because the difference type occurs as an argument type
as well as a result type of iterator operators.
This settles the issue of the granularity of our iterator type
erasure. The only thing that remains to be discussed is convertibility
between different any_iterator types.
any_iterator Types
Instantiations of the any_iterator class template are
Standard conforming iterators. Therefore, according to what we have so
far, it would be possible to assign an object of
one any_iterator type to a variable of
another any_iterator type, as long as the traits of the
two any_iterator types are compatible for type
erasure:
std::vector<double> vect; std::vector<double>::iterator vit; any_iterator<double, std::random_access_iterator_tag> rit; rit = vit; any_iterator<double, std::forward_iterator_tag> fit; fit = rit;
The iterator fit now holds a copy of rit,
which in turn holds the concrete vector iterator vit. All
operations will go through two levels of indirection. It would
certainly be a legitimate design decision to just leave it to that and
never think another thing about it. However, it bothered me that
clients could end up with nesting levels greater than 1 and perhaps
not even be aware of it. Therefore, I made it so that an assignment or
construction of one any_iterator from another is
possible only if the two are of the exact same type. Instead, I added
a few conversion operators which behave nicely, copying the wrapped
iterator rather than increasing the nesting depth. The upside of this
is that now you know that your nesting depth will never be greater
than 1, no matter what you do. The downside is that the rules for the
new conversions end up being a bit more restrictive than the rules for
assignment to an any_iterator. Here's how it works:
Let ait_source and ait_target be two
different instantiations of the any_iterator class
template. Then there is a conversion from ait_source
to ait_target if and only if either:
ait_source is better than or
equal to the traversal category of ait_target, and all
other iterator traits are exactly the same,
ait_target is a const iterator version of ait_source.
The deeper reason why I have to require the value types, reference
types, and difference types to be equal is that in C++, convertibility
is not transitive: If T1 converts to T2 and
T2 converts to T3, it does not follow
that T1 converts to T3. Therefore, if I were
to require only that the value type of ait_source
convert to the value type of ait_target, the wrapped
iterator of an ait_source object might not be suitable
for wrapping by an ait_target object.
any_iterator
The any_iterator was originally invented for the purpose
of alleviating problems that arise when the OO and generic programming
paradigms coexist. I have since found numerous more mundane
applications in small, everyday programming tasks. Here's an
example.
I recently had to write a small piece of code that iterated over a vector of things and did some processing with each element. Moreover, there was a runtime condition that specified whether ordinary iteration or reverse iteration was to be used. Needless to say, I wasn't going to deal with the direction of the iteration myself. That's the whole point of having iterators and reverse iterators. Therefore, my code was going to look something like this:
// some_vector is an STL vector of type some_vector_type
if(!reverseIteration)
{
some_vector_type::iterator begin = some_vector.begin();
some_vector_type::iterator end = some_vector.end();
// Do what needs to be done from begin to end
}
else
{
some_vector_type::reverse_iterator begin = some_vector.rbegin();
some_vector_type::reverse_iterator end = some_vector.rend();
// Do what needs to be done from begin to end
}
The annoying part is that now, the "Do what needs to be done" part operates on iterators of unrelated types. Short of the abomination of duplicating the code, that forces me to use a templatized function to do what needs to be done. In many cases, that may be just fine and dandy. Still, I must say that I hate this kind of situation with a vengeance. The fact that the ordinary iterator and the reverse iterator have unrelated types, which never felt very natural in the first place, deprives me of the freedom to choose my function nesting level the way I see fit.
Iterator type erasure saves. Below is how you can use
an any_iterator if you do not wish to use a function
template to implement the "Do what needs to be done" part. Here, you
can also see a little convenience metafunction
called make_any_iterator_type at work. This metafunction,
which comes with my any_iterator, takes an iterator type
as its argument and produces an instantiation of
the any_iterator class template with the same iterator
traits. In other words, it allows you to create
an any_iterator type "by example."
typedef
make_any_iterator_type<
some_vector_type::iterator
>::type local_iter_type;
local_iter_type begin;
local_iter_type end;
if(!reverseIteration)
{
begin = some_vector.begin();
end = some_vector.end();
}
else
{
begin = some_vector.rbegin();
end = some_vector.rend();
}
// Do what needs to be done from begin to end
Regardless of how often I am going to prefer the second solution over the first, I find it rather comforting to have the choice available to me.
As I mentioned earlier, there is an
any_iterator class template in the Adobe Public
Library, which was developed independently from
my any_iterator. It is clear from the documentation that
their effort is very similar to mine. However, they currently label
their work as "an experiment." This is probably the reason why the
documentation is not very specific. Therefore, I am not sure how
exactly our efforts relate when it comes to the more subtle design
decisions. It would of course be possible to find out more by studying
their source code or playing with their class. However, I am reluctant
to spend that kind of time and effort as long as their work is
considered experimental and may therefore change anytime.
My guess would be that the design decisions that the Adobe Library
people made are very similar to mine and possibly better. For now,
my any_iterator gives you the advantage of being
well-documented and being usable by just downloading a few header
files, rather than having to install a third party library. But if and
when the Adobe people upgrade their work from "experimental" to
"officially supported," you're probably better off going with
something that has the full support of a dedicated group of very smart
people.
any_iterator
The implementation of class templates such
as any_iterator will benefit enormously from the
introduction of concepts into the C++ Standard. If you are
unfamiliar with this proposed language addition, a good place to start
is the article on the poly library by Jaakko Jarvi, Mat Marcus,
and Sean Parent. Section 2 of the article is a
lucid and succinct introduction to concepts. The poly
library that is described in the rest of their paper is in fact a great
case study in applying concepts. The paper also has an excellent list
of references on C++ concepts.
Please read on only if you are interested in how C++ concepts will replace and improve current workarounds that are based on the SFINAE principle. If you do not want to dig that deep—and there is nothing wrong with that attitude—skip right ahead to the acknowledgements, and you are done with reading this article!
Of all the additions to the C++ language that are currently in the
pipeline, C++ concepts is probably the most important and far-reaching
one. Concepts will add an enormous amount of structure, clarity,
expressiveness, and convenience to generic programming in C++.
The any_iterator is special in this respect because it is
a rare example of a class template where C++ concepts will do more
than just simplify and improve the code. There is actually something
fairly important that I cannot do right now, but will be able to do
with concepts. In other words, the point that I will be making in the
rest of this article is this: C++ concepts are not only a better and
more convenient alternative to the workarounds that we currently
use. Rather, some of today's workarounds are positively broken and
we need concepts to fix them.
We saw earlier that one of the main issues with
the any_iterator was to enable and disable the converting
constructor and assignment operator in such a way that only certain
compatible "concrete" iterators are allowed for construction and
assignment to the any_iterator. In today's C++, this is
achieved
with boost::enable_if[11],
which I will assume you are familiar with.
My any_iterator's assignment operator, for example,
currently looks like this:
template<class WrappedIterator>
typename boost::enable_if<
detail::is_iterator_type_erasure_compatible<
WrappedIterator,
any_iterator
>,
any_iterator
>::type &
operator=(WrappedIterator const & wrapped_iterator)
{
any_iterator tmp(wrapped_iterator);
swap(tmp);
return *this;
}
Here, is_iterator_type_erasure_compatible is a
fairly complex metafunction that implements the rules for assigning
"concrete" iterator types to any_iterator types. Thanks
to boost:enable_if, this is actually a rather elegant and
readable solution. Moreover, the error message that you get when you
try to assign an iterator of incompatible type to
an any_iterator is not one of those cryptic, endlessly
nested template error messages. Your compiler will simply say
something like:
binary '=': no operator found which takes a right-hand operand of type...
But the fact remains that this is a workaround. I had to ponder over a
lot of those awful template error messages until I had it all working
on each of the three compilers that I currently support. In the shiny
new world of concepts, I will be able to express my
condition is_iterator_type_erasure_compatible directly in
the language as what it is, namely, a property of types. I will be
defining a concept that will most likely look something like this:
auto concept IteratorTypeErasureCompatible<
typename SourceIterator,
typename TargetIterator
> : Iterator(SourceIterator) && Iterator(TargetIterator)
{
requires
Convertible(
SourceIterator::value_type,
TargetIterator::value_type
)
&&
Convertible(
SourceIterator::reference_type,
TargetIterator::reference_type
)
&&
(! Reference(TargetIterator::reference_type) ||
Reference(SourceIterator::reference_type)
)
&&
// ...
};
The any_iterator's assignment operator will then look
like this:
template<class WrappedIterator>
requires IteratorTypeErasureCompatible<
WrappedIterator,
any_iterator
>
any_iterator& operator=(WrappedIterator const & wrapped_iterator)
{
any_iterator tmp(wrapped_iterator);
swap(tmp);
return *this;
}
Rather obviously, this is an enormous step forward from the workaround
using SFINAE, traits, and template metaprogramming. Moreover, the
error message for a bad assignment is going to improve a bit. It will
now give a reason for the failure to assign, namely, the violation of
the concept IteratorTypeErasureCompatible.
Things get a lot more interesting if we try to do the same thing for the converting constructor from wrapped iterator. After looking at the assignment operator, you would expect that converting constructor to look something like this:
template<class WrappedIterator>
any_iterator(
WrappedIterator const & wrapped_iterator,
typename boost::enable_if<
detail::is_iterator_type_erasure_compatible<
WrappedIterator,
any_iterator
>
>::type* dummy = NULL
)
{
// construct an any_iterator that wraps wrapped_iterator
}
There is nothing wrong with that code as it stands. And yet, under some Standard-conforming compilers, this sets into motion a long, circuitous chain of events at the end of which we'll find desaster.
Suppose that with the above definition of the converting constructor,
we try to construct an any_iterator from something that
is not even an iterator, like an STL pair of two integers:
std::pair<int, int> p(42, 42); typedef any_iterator< double, std::random_access_iterator_tag > any_ra_iterator; any_ra_iterator ra_it(p);
Needless to say, we want this construction to fail, and we're hoping
for a nice, simple error message from the compiler. So what's going to
happen? As the compiler encounters the boost::enable_if
in the constructor, it will evaluate the metafunction call:
is_iterator_type_erasure_compatible< std::pair<int, int>, any_ra_iterator >
I never showed you the details of
is_iterator_type_erasure_compatible, but we know
that it is the conjunction of a bunch of conditions such as "the value
type of the first argument converts to the value type of the second
argument." Therefore, one of the things that the compiler encounters
deep, deep down in this metafunction evaluation is this:
std::iterator_traits<std::pair<int, int> >::value_type
This template instantiation will fail, because STL pairs are not
iterators and thus have no value type. So, overall, substituting the
type std::pair<int, int> for the template parameter
WrappedIterator of our any_iterator's
converting constructor failed. That's not an error, says SFINAE, and
hence, this constructor just isn't there. We would expect a one-line
error error message like:
error: cannot convert from std::pair<_Ty1,_Ty2> to any_iterator<...
Under some compilers, that's exactly what happens. Other compilers will barf up a pile of horrible template errors at the center of which lies this:
error : 'value_type' : is not a member of 'std::pair<_Ty1,_Ty2>' with _Ty1 = ...
The reason is that the version of the SFINAE principle that the Standard requires is, in point of fact, rather lame. SFINAE is required to kick in only if the substitution failure occurs at the very top of the template instantiation stack. In our case, the substitution failure is way down on the stack. Some compilers voluntarily apply SFINAE in this case as well, but nobody is required to do so.
Ok, so what kind of a problem have we seen so far? Almost none at
all. The only bad thing is that when we try to construct
an any_iterator from something that is not even an
iterator in the first place, then some compilers give us an ugly error
message instead of a nice one. Big deal. Ugly error messages abound in
today's C++ programming with templates.
Everything would be quite alright if it weren't, of all things, a
converting constructor that barfs up the ugly error message instead of
politely disabling itself. Consider what happens if we ask
whether std::pair<int, int> is convertible to
the any_iterator:
typedef any_iterator< double, std::random_access_iterator_tag > any_ra_iterator; bool b = boost::is_convertible< std::pair<int, int>, any_ra_iterator >::value;
One of the things that the
metafunction boost::is_convertible will do is to look for
a converting constructor of any_ra_iterator that takes
an std::pair<int, int> as its argument. In doing
so, it will encounter the compile error that we discussed
earlier. Therefore, the code above will not compile.
Now that's really bad. For a while, I tried to kid myself into
thinking that one could live with that, but no. A type which can cause
a compile error when it is used as the second argument
to boost::is_convertible is evil. In particular, if a
class has a non-explicit, templatized converting constructor, then the
mere act of substituting for the template argument must not cause a
compile error. The reason why this principle is important is this: if
a class violates the principle, then the compiler can encounter an
error while it is building a function overload set. Even though the
offending constructor may end up not being used in the situation for
which the overload set is built, just creating the overload set causes
a compile error. That is pure evil if I ever saw it. In fact, this is
the exact reason why the SFINAE principle exists in the first
place.
My problem here is that the substitution failure occurs deep down on
the template instantiation stack, where the SFINAE principle as
required by the Standard does not apply. It is not possible to fix
things so that the failure occurs at the top of the stack. To do that,
I would need a metafunction is_iterator,
or has_iterator_traits, or some such thing. Such a
metafunction does not exist, and it is impossible to write one (try
it).
Once we'll have concepts in C++, all this is going to go away. Just
take one look at the
concept IteratorTypeErasureCompatible that I outlined
earlier. The very first thing that it requires is
that WrappedIterator is an iterator at all. If that's not
the case, the concept is not modeled, and all is well.
For now, I could see no alternative but make the constructor that
constructs an any_iterator from a wrapped iterator
explicit and replace the enable_if with a static
assert. I really hated to do that, but for the life of me, I can't see
any other way out until there will be concepts in C++. What this means
is that for now, a wrapped iterator never converts to
an any_iterator. There are only two ways to get a
"concrete" iterator object into an any_iterator variable:
either by ordinary assignment, as in:
std::vector<int> vect; any_iterator<int, std::forward_iterator_tag> ait; ait = vect.begin();
or using explicit construction:
std::vector<int>; any_iterator< int, std::forward_iterator_tag > ait_1(vect.begin()); // fine any_iterator< int, std::forward_iterator_tag > ait_2 = vect.begin(); // error, requires non-explicit ctor
Please, C++ Standard committee and compiler vendors, give us concepts! We're in a lot of pain here!
I am greatly indebted to Dave Abrahams, Christopher Baus, Fred
Bertsch, Don Harriman, and Thomas Witt, who have helped me in more
ways than I can recall. All remaining errors and inadequacies, both in
this article and in my any_iterator implementation, are
mine.
Have an opinion about Tension between Object-Oriented Programming and Generic Programming?
Discuss this article in the Articles Forum topic, On the Tension between Object-Oriented and Generic Programming in C++, and What Type Erasure Can Do about It.
[1] Bjarne Stroustrup's definition of C++
http://www.research.att.com/~bs/glossary.html#GC++
[2] Download my any_iterator (with documentation and regression tests)
http://thbecker.net/free_software_utilities/type_erasure_for_cpp_iterators/IteratorTypeErasure.zip
[3] An interview with Alexander Stepanov
http://www.stlport.org/resources/StepanovUSA.html
[4] Dave Abrahams' and Aleksey Gurtovoy's book on C++ template metaprogramming
http://boost-consulting.com/mplbook/
[5] boost::any
http://www.boost.org/doc/html/any.html
[6] boost::function
http://www.boost.org/doc/html/function.html">
[7]
Article on the poly librar by Jaakko Jarvi, Mat Marcus, and Sean Parent
http://homepages.fh-regensburg.de/~mpool/mpool07/proceedings/5.pdf
[8]
Adobe Public Library's any_iterator
http://opensource.adobe.com/classadobe_1_1any__iterator.html">
[9]
boost::iterator_facade
http://www.boost.org/libs/iterator/doc/index.html#iterator-facade-and-adaptor
[10]
Boost new style iterators
http://www.boost.org/libs/iterator/doc/index.html#new-style-iterators
[11] boost::enable_if
http://www.boost.org/libs/utility/enable_if.html>
Thomas Becker is a scientific software engineer at Zephyr Associates, Inc. in Lake Tahoe, Nevada, where he works on financial analytics software. He is a former columnist for the now defunct C/C++ Users Journal. You can reach him at thomas at styleadvisor dot com. To find out more about Thomas, visit his homepage at thbecker.net
|
Sponsored Links
|