The Artima Developer Community
Sponsored Link

The C++ Source
Conditional Love: FOREACH Redux
by Eric Niebler
February 17, 2005

<<  Page 3 of 4  >>

Advertisement

Evil Dead Code II

Now that we have a way of ensuring that the container expression only gets evaluated once, we can tackle the other problem with the FOREACH macro: making it work with rvalue container expressions. First, some terminology. The C++ standard uses the terms lvalue and rvalue to discriminate between objects that have names (lvalues) and those that don't (rvalues)[3]. Every variable you explicitly declare is an lvalue. rvalues are more elusive; they are the unnamed temporary objects that flit into and out of existence as a result of operations, conversions and function calls.

Consider the following invocation of FOREACH:

FOREACH( char ch, string("Hello") )
{
   cout << ch;
}
This should write every character in the string to cout, but making it do that is no mean trick. The problem is that string("Hello") is a temporary object - an rvalue. It winks out of existence before we have a chance to iterate over it, unless we pin it down. The problem is obvious once we expand the macro a bit:
auto_any_base const & iter = begin( string("Hello") );
...
Here, we see that begin() is going to return an iterator to a temporary string object that is going to die at the semicolon. An iterator into a deceased container isn't very useful, so clearly something must be done. Let's make a copy of the container and iterate over that! We need to store the copy somewhere, but we don't know the container's type (in general). But hey, we already know how to solve this problem! Just wrap the copy in an auto_any and bind it to a const reference to auto_any_base. That will ensure that the container sticks around long enough to iterate over it. Of course, making a copy of an STL container is an expensive proposition, and we don't want to have to pay that price when we don't have to. If the container is an lvalue (that is, a named object), we don't have to make a copy because it is not in danger of going out of scope while we iterate over it. Only when the container is an rvalue does FOREACH actually need to make a copy. FOREACH should automatically detect whether the container is an lvalue or an rvalue, and store a copy only if it is needed.

When mathematicians have a problem, they sometimes start by assuming the solution. It has always seemed like cheating to me, but let's use the same trick here. Let's assume (for now) that we can detect the lvalue-ness or rvalue-ness of an expression and write the result into a Boolean flag called is_rvalue. Then, we could pass the container and is_rvalue to a helper function, which returned either a pointer to the container (for lvalues) or a copy (for rvalues). Since we need to return a thingy or a pointer-to-thingy, we need something like a union for the return type. boost::variant [4] is a perfect fit. The helper function looks like this:

// contain() - returns a Container or a Container*
template< class Container >
auto_any< boost::variant<Container const*,Container> >
contain( Container const & t, bool const & is_rvalue )
{
   // Return either a Container or a Container* depending on whether
   // the container is an rvalue or not.
   typedef boost::variant<Container const*,Container> variant_t;
   return is_rvalue ? variant_t(t) : variant_t(&t);
}
(In case you are wondering why contain() takes is_rvalue by reference-to-const instead of by value, the reason becomes clear later.) Now that the container is wrapped in a variant, which is wrapped in an auto_any, we need to make our begin() function privy to this subterfuge. begin() will take the container all wrapped up, the Boolean is_rvalue flag, and the encoded type of the container, and it will return the container's begin iterator, like this:
// begin() - returns the begin iterator
template< class Container >
auto_any< typename Container::const_iterator >
begin( auto_any_base const & container, bool is_rvalue,type2type<Container> )
{
   typedef boost::variant<Container const*,Container> variant_t;
   variant_t & var = auto_any_cast< variant_t>(container);
   // Extract either a Container or a Container* depending on whether
   // the container is an rvalue or not.
   Container const & c = is_rvalue ? boost::get<Container>(var)
                                   : *boost::get<Container const *>(var);
   return c.begin();
}
We've assumed that there is some way to detect the lvalue- ness or rvalue-ness of an expression. Let's call it EVAL and further assume that it is a macro that takes an expression and a Boolean flag. It evaluates the expression and returns it. As a side-effect, it writes into the flag whether the expression is an rvalue or not. Using the hypothetical EVAL macro, we can make FOREACH work with rvalues like this:
#define FOREACH( item, container ) \
   bool is_rvalue; \
   auto_any_base const & cont = contain(EVAL(container,is_rvalue), is_rvalue ); \
   auto_any_base const & iter = begin( cont, is_rvalue, ENCODED_TYPEOF(container) ); \
   ...
The function contain() takes two arguments: EVAL(container,is_rvalue) and is_rvalue. The first argument writes to is_rvalue as a side-effect, and the second argument passes is_rvalue to contain(). If this strikes you as dodgy, you're right. In C++, argument evaluation order is unspecified. What if the second argument (is_rvalue) is evaluated before the first argument writes to it? Will the contain() function see garbage for the value of is_rvalue? No, and this is why contain() takes is_rvalue by reference-to-const instead of by value. Since we are passing a reference, the value of is_rvalue is not read until we are inside the contain() function, at which point all of its function arguments have been evaluated and all their side-effects are visible. Had we just passed is_rvalue by value, all bets would be off. The moral is: don't rely on the evaluation order of function arguments!

If we merely assume the existence of EVAL, our C++ compiler will complain. (Assumed solutions are good enough for mathematicians, but not C++ compilers.) We need to be more explicit, but detecting whether an expression is an lvalue or an rvalue presents a challenge reminiscent of the Heisenberg Uncertainty Principle—the act of measuring it tends to change the very property you are trying to measure [5]. Once again, the lowly conditional operator saves the day. The following code illustrates the principle by defining a macro RVALUE_TEST that displays "rvalue" for rvalue expressions, and "lvalue" for lvalue expressions. We'll get to the details in a minute.

struct rvalue_probe
{
   template< class R > operator R ()
   {
        throw "rvalue";
   }
   template< class L > operator L & () const
   {
        throw "lvalue";
   }
};

#define RVALUE_TEST(container) \
   try { ( true ? rvalue_probe() : (container) ); } \
   catch( char const * result ) { cout <<result << '\n'; }
Now you can use RVALUE_TEST(expression) to test the rvalue-ness or lvalue-ness of an expression. For instance, RVALUE_TEST(cout) will display "lvalue" since cout is a named object. But RVALUE_TEST(make_pair(1,2)) will display "rvalue". It's not immediately obvious how this works[6], so we'll need to consult the Good Book (the C++ standard) to make sense of it. According to section 5.16/3:

5.16/3
... if the second and third operand have different types, and either has (possibly cv-qualified) class type, an attempt is made to convert each of those operands to the type of the other. The process for determining whether an operand expression E1 of type T1 can be converted to match an operand expression E2 of type T2 is defined as follows:

This section is describing the case of RVALUE_TEST(cout). Since cout is an lvalue of type ostream, the compiler first tries to convert the expression rvalue_probe() to type "reference to ostream". This succeeds, because rvalue_probe defines a conversion operator to L & for any L. This conversion operator gets called, and it throws the string "lvalue", which gets caught and printed.

The other case is more interesting. If we read further in the standard, it says (deep breath):

5.16/3 (continued)

This is actually much simpler than it sounds. If the expression is an rvalue of type R, then the compiler will try to convert rvalue_probe() to an rvalue of type R. But there's a catch. (You knew it couldn't be that simple, right?) There are two ways for this conversion to happen. The compiler could chose to use rvalue_probe::operator R() and be done, or it could use rvalue_probe::operator L &() const, and then use the standard lvalue-to-rvalue conversion (4.1) to achieve the same result.

Two possible conversions! Rather than throw up its hands, the compiler performs overload resolution to pick the best conversion sequence. Let's not go into the subtleties of overload resolution (Heaven help us!), but the end result is that operator R() gets picked over operator L & () const.

If you've made it this far, congratulations! There's no doubt that this is arcane stuff, but we are rewarded with a robust way to detect the rvalue-ness and lvalue-ness of any expression. We can use this technique to finally implement the EVAL macro assumed above. Just a few small tweaks to the rvalue_probe struct give it the ability to pass an object through unchanged while recording whether it is an rvalue or lvalue. It is explained in detail below.

struct rvalue_probe
{
   template< class T >
   rvalue_probe( T const & t, bool & b )
        : ptemp( const_cast< T * >( &t ) ), is_rvalue( b )
   {}

   template< class R > operator R()
   {
        is_rvalue = true;
        return *static_cast< R * >( ptemp );
   }

   template< class L > operator L &() const
   {
        is_rvalue = false;
        return *static_cast< L * >( ptemp );
   }

   void * ptemp;
   bool & is_rvalue;
};
And now at last we can define the EVAL macro as:
     #define EVAL( container, is_rvalue )    \
          ( true ? rvalue_probe( (container), is_rvalue ) : (container) )
The rvalue_probe object stores a pointer to the container expression container and a reference to a Boolean flag. The other operand of the condition operator is container, raw and unadorned. As before, the second operand is "dead code" that will never execute, but its rvalue-ness or lvalue-ness will influence which conversion operator gets called on the rvalue_probe object. Both conversion operators do essentially the same thing: dereference the pointer to the container expression and return the result. As a side-effect, it records in the Boolean flag which conversion operator was called. The upshot is that the is_rvalue flag can be used by the rest of the FOREACH macro to either make a copy of the container or not, accordingly. Listing 2 (see fixed_foreach.cpp) uses this trick to improve upon the code in Listing 1—it does not reevaluate the container expression, and it works with rvalue containers.

<<  Page 3 of 4  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us