The Artima Developer Community
The C++ Source | C++ Community News | Discuss | Print | Email | First Page | Previous | Next
Sponsored Link

The C++ Source
Conditional Love: FOREACH Redux
by Eric Niebler
February 17, 2005

Page 1 of 4  >>

Advertisement

Summary
Plowing through some devilish details of template argument deduction, the conditional operator and the macro preprocessor, Eric develops a robust FOR_EACH iterator. Whether you're using arrays, strings, or containers, this one does it all.

My CS201 "Software Development Methods" professor taught that all anyone would ever need to know about C++'s ternary conditional operator (?:) was that it was poorly understood and best avoided. He was right about one thing: the conditional operator is poorly understood, by me at least. Ten years later, I've only begun to appreciate the subtlety and the power of the conditional operator. But contrary to what my professor would have me believe, it is a unique and indispensable tool. In fact, the lowly conditional operator is the magic ingredient in BOOST_FOREACH, a macro that makes it simple to loop over STL collections, arrays, null-terminated strings, and iterator ranges, among other things. In this article, I'll zoom in on this operator and discover its unique properties. Using the conditional operator, you will see how to encode the type of an expression without evaluating it and how to detect whether an expression is an lvalue or an rvalue. Thus armed, I'll take aim at a very vexing problem: how to write a "safe" macro that does not reevaluate its arguments. Along the way, I will implement a toy FOREACH macro that allows iteration over any STL container.

This article takes a very close look at a very small part of the language—the conditional operator. I make occasional reference to the standard, and the analysis gets fairly technical. Consider yourself warned.

Ugly Macros

In the November 2003 issue of the C/C++ Users' Journal, Anson Tsao and I published an article entitled "FOREACH and LOCK" in which we described how to bring the functionality of C#'s foreach and lock keywords to C++. The FOREACH macro lets you iterate over STL containers with syntax like this:

vector< string > vect_str( /* ... */ );

FOREACH( string str, vect_str )
{
   cout << str << '\n';
}

Why would you want to do such a thing? It is more concise and less error-prone than using iterators directly, certainly, but wouldn't it be better to use the std::for_each algorithm? It depends. std::for_each is a good idea, but in practice it is often a clumsy tool. If C++ had native support for lambda expressions, std::for_each would make sense, but in the language we have today std::for_each forces you to move your loop body into a function (or function object) at namespace scope, far from where it will be used. Code that uses std::for_each extensively is littered with these small, one-off functions that make little sense in isolation. Also, you can't use break or continue once you are committed to using std::for_each. By comparison, FOREACH clearly expresses the programmer's intent, is easy to use correctly, and lets you put your loop body where it makes sense. If you think that macros are ugly, you'll get no argument from me. A foreach keyword and lambda functions sure would be nice, but that's not the language we have to work with, unfortunately. And as we'll see, the conditional operator is just the tool we need to teach those evil, unruly macros some manners.

Listing 1 in file broken_foreach.cpp shows a bare-bones, stripped down version of the FOREACH macro as it was implemented at the time. Don't sweat the details yet; we'll get to them in short order. This version has problems—it evaluates the container expression multiple times, and it doesn't work if the container is an unnamed temporary object. The following code illustrates the reevaluation problem. It uses FOREACH to loop over a container of doubles, but the container expression, get_vect_dbl(), is a function, which leads to some surprising behavior.

vector<double> vect_dbl( 3, 1.0 );

vector<double> const & get_vect_dbl()
{
   cout << "here!\n";
   return vect_dbl;
}

// elsewhere ....
FOREACH( double d, get_vect_dbl() )
{
   cout << d << '\n';
}
You might expect to see "here!" displayed only once, but what you actually see is:
     here!
     here!
     here!
     here!
     1
     here!
     here!
     here!
     1
     here!
     here!
     here!
     1
     here!
     here!
This is because get_vect_dbl() is being evaluated multiple times by the FOREACH macro. Surprise! This is a typical macro bug-a-boo, and one of the reasons why macros have such a bad reputation. Squashing this bug is our first order of business, but first we'll need to understand a bit about what the FOREACH macro is doing, and why.

How It Works, And Why It Doesn't Quite

Imagine that it is your job to implement the FOREACH macro. "Well, I'll probably need some iterators," you think to yourself, and you write something like this:

#define FOREACH( item, container )      \
   ??? iter = (container).begin();
Now you're stuck. What type do you use to declare your iterator? You don't know the type of the container, and you certainly don't know the type of its iterators. Somehow you must be able to declare variables for the iterators without knowing the iterators' type! It's not impossible, but it isn't obvious, either. Let's see how.

First, we use the ScopeGuard trick [1]: temporary objects of derived type can have their lifetime extended by binding them to a const reference to the base type. This is a magic property of const references and temporary objects. If you're surprised, don't worry—it surprises everyone at first. The following code illustrates the trick. It defines an empty base class, a derived class which wraps some piece of data, and a function which returns an iterator, suitably wrapped.

struct auto_any_base {};

template< class T > struct auto_any :
auto_any_base
{
   auto_any( T const & t ) : item( t ) {}
   mutable T item;
};

template< class Container >
auto_any< typename Container::const_iterator >
begin( Container const & c )
{
   return c.begin();
}
Now, in our FOREACH macro we can say:
#define BOOST_FOREACH( item, container ) \
   auto_any_base const & iter = begin( container ); \
   ...
begin( container ) returns the begin iterator, wrapped in an auto_any temporary object. That temporary object gets bound to iter, which is a const reference to auto_any_base, the empty base class. This gives the temporary object a reprieve of sorts—its lifetime is extended, along with the iterator stored inside.[2] As the name "auto_any" suggests, this is a general mechanism for putting an object of unknown type in automatic storage (i.e. it is not dynamically allocated using new).

We have succeeded in storing an iterator, but we have lost its type information. iter is a const reference to an auto_any_base—that tells us nothing about what is actually stored in there. It could be a vector iterator, or it could be your Auntie May's Christmas fruit cake. If we want to be able to increment the iterator, for instance, we need to recover the lost type information.

Keep in mind that we're writing a macro, so we are free to reuse any of the macro arguments as many times as we like. In particular, we can reuse the container expression to infer the type of the iterator. The function next() below does just that.
// extract a reference to the internally stored item
template< class T >
T & auto_any_cast( auto_any_base const & base )
{
  return static_cast< auto_any< T > const & >( base ).item;
}

// infer the type of the iterator
template< class Container >
void next( auto_any_base const & iter, Container const& )
{
  ++auto_any_cast< typename Container::const_iterator >( iter );
}
The function auto_any_cast() takes a const reference to an auto_any_base and casts it to the requested derived type. It uses a static_cast for this; this is kosher because we are allowed to static_cast from a base to a derived type. It returns a reference to the contained piece of data. The function next() uses auto_any_cast() to access the iterator and increment it. We use next() by passing it both iter and the container, so that next() can deduce the type of the container and infer the type of the iterator. If you recall, the begin() function defined above puts a Container::const_iterator into the auto_any. The function next() must follow suit by casting back to a Container::const_iterator. Notice that the container is not actually used by next() for anything besides template argument deduction. The snippet below shows how this all hangs together.
#define FOREACH( item, container )                \
   auto_any_base const & iter = begin( container ); \
   ...                                     \
   next( iter, container );                     \
   ...
This works very well as long as container is something simple, like a variable of type vector. But imagine what happens if container is a function call. Since container appears twice, it will get called twice. And if the function has side-effects, all heck breaks loose. This is the crux of the problem. It is especially galling considering that in the call to next(), we don't even need the container; we just need its type. There's gotta be a better way!

Page 1 of 4  >>

The C++ Source | C++ Community News | Discuss | Print | Email | First Page | Previous | Next

Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us