|
|
|
The C++ Source |
C++ Community News |
Discuss |
Print |
Email |
Screen Friendly Version |
Previous |
Next
|
|
Sponsored Link •
|
"Programmers who overload unary operator& should be sentenced to writing libraries that need to operate properly when fed such classes." - Peter Dimov, esteemed Boost member, Boost newsgroup June 2002
Although that's very funny, it's also a pretty strong statement. Why is there such antipathy to the use of this operator?
In this chapter we'll look at some of the problems that doing this can cause. Our solution will be an unexciting one, simply the recommendation that you follow Peter's implicit advice, and forswear any use of this overload for shortsighted gains and avoid much grief down the line.
One thing I should make clear. Anytime in this chapter that I
refer to operator &() it will be the unary form,
which is the address of operator. The binary form, which is the
bitwise AND operator, is an entirely different beast.
class Int
{
Int operator &(Int const &); // Bitwise operator
void *operator &() // Address-of operator
. . .
Like most C++ operators, you're free to return anything you like
from operator &(). This means that you can alter the
value, or the type, or both. This can be a powerful aid in rare
circumstances, but it can also cause you a world of trouble.
construct()
and destroy() methods, whose canonical definitions
are as follows:
template <typename T>
struct some_allocator
{
. . .
void construct(T* p, T const &x)
{
new(p) T(x);
}
void destroy(T* p)
{
p->~T();
}
. . .
The construct() method is used by containers to in-
place construct elements, as in
template <typename T, . . . >
void list::insert(. . ., T const &x)
{
. . .
Node *node = . . .
get_allocator().construct(&node->value, x);
If you're storing in the list overloads operator
&() to return a value that cannot be converted to
T* then this line will fail to compile.
operator
&. There are quite a number of classes that overload it,
including CComBSTR, CComPtr and
CComVariant.
To account for the incompatibility between ATL types and STL
containers, the designers of ATL introduced the
CAdapt template, which attempts to solve the problem
by containing an instance of its parameterising type. It then
provides implicit conversion operators and comparison operations
to allow it to be used in place of its parameterising type.
Because CAdapt<T> does not overload operator
&(), it can be used to mask the overload for any
T that does.
template <typename T>
class CAdapt
{
public:
CAdapt();
CAdapt(const T& rSrc);
CAdapt(const CAdapt& rSrCA);
CAdapt &operator =(const T& rSrc);
bool operator <(const T& rSrc) const
{
return m_T < rSrc;
}
bool operator ==(const T& rSrc) const;
operator T&()
{
return m_T;
}
operator const T&() const;
T m_T;
};
Unfortunately, this is just a sticking plaster on a broken arm.
As we saw in Chapter 23, templates that inherit from their
parameterising type have a deal of trouble in unambiguously
providing access to the requisite constructors of their parent
class. The same problem exists for types such as
CAdapt, which enhance their parameterising type via
containment rather than inheritance. All the constructors of
T, except the default and copy constructors, are
inaccessible. This clutters your code, reduces the applicability
of generic algorithms, and prevents the use of RAII (see Section
3.5).
template<typename T>
T *get_real_address(T &t)
{
return reinterpret_cast<T*>(&reinterpret_cast<byte_t &>(t));
}
There are other complications, to account for
const and/or volatile, but that's the
essence of it. The Boost libraries have a nifty
addressof() function, which takes account of all the
issues.
But the use of reinterpret_cast is cause for some
concern. The standard (C++-98: 5/2.10;3) says: "the mapping
performed . is implementation-defined. [Note: it is intended to be
unsurprising to those who know the addressing structure of the
underlying machine]". Since the result may conceivably not be
valid, it's not possible to claim that this technique is truly
portable. However, it's also pretty hard to imagine a compiler
that would not perform the expected conversion.
We can now side step types with pathological operator
&() overloads, but this would require peppering all our
code with calls to the real address shim. But it's ugly, and its
correctness is implementation-defined. Do you want to use a
standard library with myriad reinterpret_casts?
Since it's a function like any other, the operator
&() overload can do things other than simply return a
converted value. This has serious consequences.
operator&()
breaks encapsulation.
That's a bold statement. Let me illustrate why it is so.
As I've mentioned already, ATL has a large number of wrapper
classes that overload operator &(). Unfortunately,
there are different semantics to their implementations. The types
shown in Table 26.1 all have an assertion in the operator method
to ensure that the current value is NULL.
| Wrapper Classes | operator&() Return Type |
|---|---|
CComTypeAttr |
TYPEATTR** |
CComVarDesc |
VARDESC** |
CComFuncDesc |
FUNCDESC** |
CComPtr / CComQIPtr |
T** |
CHeapPtr |
T** |
Table 26.1
Don't worry about the specifics of the types
TYPEATTR, VARDESC and
FUNCDESC—they're POD Open type structures (see
Section 4.4) used for manipulating COM meta data. The important
thing to note is that they have allocated resources associated
with them but they do not provide value semantics, which means
that they must be managed carefully in order to prevent resource
leaks or use of dangling pointers.
The operator is overloaded in the wrapper classes to allow
these types to be used with COM API functions that manipulate the
underlying types, and to be thus initialised. Of course, it's not
an initialisation as we RAII-phile C++ types know and love it, but
it is initialisation, because the assertion means that any
subsequent attempt to repeat the process will result in an error,
in debug mode at least. I'll leave it up to you to decide whether
that, in and of itself, is a good way to design wrapper classes,
but you can see that you are required to look inside the library
to see what is going on. After all, it's using an overloaded
operator, not calling a function named
get_one_time_content_pointer()[1].
The widely used CComBSTR class, which wraps the
COM BSTR type, also overloads operator
&() to return BSTR*, but it does not
have an assertion. By contra-implication, we assume that this
means that it's OK to take the address of a CComBSTR
multiple times, and, since the operator is non-const, that we can
make multiple modifying manipulations to the encapsulated
BSTR without ill-effect. Alas, this is not the case.
CComBStr can be made to leak memory with ease:
void SetBSTR(char const *str, BSTR *pbstr);
CComBSTR bstr;
SetBSTR("Doctor", &bstr); // All ok so far
SetBSTR("Proctor", &bstr); // "Doctor" is now lost forever!
We can surmise that the reason CComBSTR does not
assert is that it proved too inconvenient. For example, it is not
uncommon to see in COM an API function or interface method that
will take an array of BSTR. Putting aside the issue
of passing arrays of derived types (see Sections 14.5; 33.4), we
might wish to use our CComBSTR when we're only
passing one string.
An alternative strategy is to release the encapsulated resource
within the operator &() method. This is the approach
of another popular Microsoft COM wrapper class, the Visual C++
_com_ptr_t template. The downside of this approach is
that the wrapper is subject to premature release on those
occasions when you need to pass a pointer to the encapsulated
resource to a function that will merely be using it, rather than
destroying it or removing it from your wrapper. You may think that
you can solve this by declaring const and non-
const overloads of operator &(), as in
Listing 26.2.
template <typename T>
class X
{
. . .
T const *operator &() const
{
return &m_t;
}
T *operator &()
{
Release(m_t);
m_t = T();
return &m_t;
}
Unfortunately, this won't help, because the compiler selects
the overload appropriate to the const-ness of the
instance on which it's to be called, rather than on the use one
might be making of the returned value. Even if you pass the
address of a non-const X<T> instance to a function
that takes T const *, the non-const
overload will be called.
To me, all this stuff is so overwhelmingly nasty that I stopped using any
such classes a long time ago. Now I like to use explicitly named methods
and/or shims to save me from all the uncertainty. For example, I use the
sublimely named[2] BStr
class to wrap BSTR. It provides the
DestructiveAddress() and NonDestructiveAddress()
methods, which, though profoundly ugly, don't leave anyone guessing as to
what's going on.
Another source of abuse in overload operator &() is
in the type it returns. Since we can make it return anything, it's
easy to have it return something bad; naturally, this is the case
for any operator.
We saw in Chapter 14 some of the problems attendant in passing
arrays of inherited types with functions that take pointers to the
base type. There's another dimension to that nasty problem when
overloading operator &(). Consider the following
types:
struct THING
{
int i;
int j;
};
struct Thing
{
THING thing;
int k;
THING *operator &()
{
return &thing;
}
THING const *operator &() const;
};
Now we're in the same position we would be if
Thing inherited publicly from THING.
void func(THING *things, size_t cThings); Thing things[10]; func(&things[0], dimensionof(things)); // Oop!!
By providing the operator &() overloads for
"convenience", we've exposed ourselves to abuse of the
Thing type. I'm not going to suggest the application
of any of the measures described in Chapter 14 here, because I
think overloading operator &() is just a big no-no.
A truly bizarre confluence of factors is the case where the operator is destructive—it releases the resources—and you are passing an array of (even correctly size) wrapper class instances to a function, as in Listing 26.4.
struct ANOTHER
{
. . .
};
void func(ANOTHER *things, size_t cThings);
inline void func(array_proxy<ANOTHER> const &things)
{
func(things.base(), things.size());
}
class Another
{
ANOTHER *operator &()
{
ReleaseAndReset(m_another);
return &m_another;
}
private:
ANOTHER m_another;
};
Let's assume you're on your best behaviour, and are using an
array_proxy (see Section 14.5.5) and translator
method to ensure that ANOTHER and
Another can be used together.
Another things[5]; . . . // Modify things func(things); // sizeof(ANOTHER) must == sizeof(Another)
Irrespective of the semantics of func(), in
calling the function things[0] will be reset and
things[1] - things[4] will not be
affected. This is because the array constructor of
array_proxy uses explicit array subscript syntax, as
all good array manipulation code should. If you were to do it
manually, you'd still need to apply the operator, unless
Another inherited publicly from ANOTHER
and you called the two parameter version of func()
and relied on array decay.
If func() does not change the contents of the
array passed to it, then this supposedly benign call has the nasty
side effect of destroying the first element passed to it. If
func() modifies the contents of the array, then
things[1] - things[4] are subject to
resource leaks, as their contents prior to the call are simply
overwritten by func().
I hope I've managed to convince you that Peter was spot on. Overloading
operator &() is just far too much trouble. Consider the
amount of coding time, thinking time and debugging time that is expended
trying to understand and work with libraries that use it, I struggle to
imagine how using it helps the software engineering community[3].
In short, don't do it. In grepping through my source databases at the time of writing, I found eleven uses of it. Of the three that were used in "proper" classes—i.e. those that are not in utility or meta-programming classes—I can probably truly justify only one of them. I removed two immediately[4]. The third I cannot justify, but I'm keeping it for reasons of expediency. For grins, I'll describe this in the following sub-section.
operator &().
The Win32 API defines many non-standard basic structures,
oftentimes for closely related types. Further, since many Win32
compilers did not provide 64-bit integers in the early years of
the operating system, there are several 64-bit structures that
filled in the gap. Two such structures are
ULARGE_INTEGER and FILETIME. Their
structures are as follows:
struct FILETIME
{
uint32_t dwLowDateTime;
uint32_t dwHighDateTime;
};
union ULARGE_INTEGER
{
struct
{
uint32_t LowPart;
uint32_t HighPart;
};
uint64_t QuadPart;
};
Performing arithmetic using the FILETIME structure
is tiresome, to say the least. On little-endian systems, the
layout is identical to that of ULARGE_INTEGER, so
that one can cast instances of one type to the other, hence one
can manipulate two subtract FILETIME structures by
casting them to ULARGE_INTEGER and subtracting the
QuadPart members.
FILETIME ft1 = . . . FILETIME ft2 = . . . FILETIME ft3; GetFileTme(h1, NULL, NULL, &ft1); GetFileTme(h2, NULL, NULL, &ft2); // Subtract them - yuck! reinterpret_cast<ULARGE_INTEGER&>(ft3).QuadPart = reinterpret_cast<ULARGE_INTEGER&>(ft1).QuadPart - reinterpret_cast<ULARGE_INTEGER&>(ft2).QuadPart;
This also is pretty tiresome, so I concocted the
ULargeInteger class. It supplies various arithmetic
operations (see Chapter 29), has a compatible layout with the two
structures, and provides an operator &() overload.
The operator returns an instance of Address_proxy,
whose definition is shown in Listing 26.5:
union ULargeInteger
{
private:
struct Address_proxy
{
Address_proxy(void *p)
: m_p(p)
{}
operator LPFILETIME ()
{
return reinterpret_cast<LPFILETIME>(p);
}
operator LPCFILETIME () const;
operator ULARGE_INTEGER *()
{
return reinterpret_cast<ULARGE_INTEGER*>(p);
}
operator ULARGE_INTEGER const *() const;
private:
void *m_p;
// Not to be implemented
private:
Address_proxy &operator =(Address_proxy const&);
};
Address_proxy operator &()
{
return Address_proxy(this);
}
Address_proxy const operator &() const;
. . .
It holds a reference to the ULargeInteger instance
for which it acts, and it provides implicit conversions to both
FILETIME* and ULARGE_INTEGER*. Since the
proxy class is private, and instances of it are only
returned from the ULargeInteger's address-of
operators, it is relatively proof from abuse, though you'd be
stuck if you tried to put it in an STL container. But it
considerably eases the burden of using these Win32 structures:
ULargeInteger ft1 = . . . ULargeInteger ft2 = . . . GetFileTme(h1, NULL, NULL, &ft1); GetFileTme(h2, NULL, NULL, &ft2); // Subtract them - nice syntax now ULargeInteger ft3 = ft1 - ft2;
BSTR and BStr are far too alike, and have caused
me no end of bother.
Matthew Wilson is author of Imperfect C++, which
is available on Amazon.com at:
http://www.amazon.com/exec/obidos/ASIN/0321228774/
Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/. Matthew is co-author with Bjorn Karlsson of The C++ Source column, Smart Pointers.
|
Sponsored Links
|