The Artima Developer Community
Sponsored Link

The C++ Source
Reducing Preprocessor Namespace Pollution
by Bjorn Karlsson and Matthew Wilson
November 6, 2004

<<  Page 2 of 3  >>

Advertisement

Get your size-12s off my Daffodils!

The problem is that at the point of inclusion of BarbSoftStuff.h, it introduces the namespace BaRBSoft and the function BaRBSoft::TheFunc(). That's the correct thing. Unfortunately, when AcmeThreadingStuff.h is subsequently included, it defines TheFunc to be TheFuncST (or TheFuncMT for multithreaded builds) for the remainder of the compilation unit. So where you see BaRBSoft::TheFunc() in the body of main(), the compiler actually sees BaRBSoft::TheFuncST(). Not happy, Bjarne! (You won't have to study much of Bjarne's writings to discover his antipathy to macros, as in [2, 3, 4]. Where the master leads, so shall we happy grasshoppers follow ...)

You might wonder whether this can be fixed by reversing the order of inclusion. Alas, that just shifts the problem.

 #include "AcmeThreadingStuff.h"
 #include "BaRBSoftStuff.h"
 
 int main()
 {
   TheFunc();
 
   int regIndex;
 
   BaRBSoft::TheFunc("Billy Kriesel", &regIndex);
   return 0;
 }

Now the compiler is perfectly happy, but the linker gets the hump. The reason is that the declaration of BaRBSoft::TheFunc() inside BaRBSoftStuff.h is translated by the preprocessor to BaRBSoft::TheFuncST(). The same thing happens, as before, in the body of main(), so the compiler sees both the definition and the use of the same symbol. However, because BaRBSoft are jealous guarders of their intellectual property, and supplied only a static library, containing BaRBSoft::TheFunc(), against which to link, the linker fails to find BaRBSoft::TheFuncST().

So, whichever way you cut it, the #define of TheFunc() in AcmeThreadingStuff.h has trampled over our code, and broken it.

(For further reading on this issue�or many other important ones�we think it's worth pointing you to the latest in Herb Sutter's excellent Exceptional C++ series, Exceptional C++ Style [5]. Item 31 explains the problem.)

War Story

Several years ago, Matthew worked for a software company writing cross-platform software for network administration and statistical gathering. The software used its own messaging system, and one of the methods in the messaging API was called GetMessage(). It all worked tickety- boo. Then they had to port their nice working system to Windows.

I'm sure you can guess the rest. Lots of compiler / linker problems complaining that SuperDuperNetworkMgr::GetMessageA() could not be found. No doubt many of you are groaning in recognition of the problem, and have experienced first hand the Windows headers #definition of GetMessage to either GetMessageA or GetMessageW, among myriad similar. Needless to say, this didn't endear the development team's Tandem/UNIX-heads to Windows.

They weren't in a position to sit back and pontificate on the abstract problem. A solution had to be found, and fast. The choices in this case were all unpleasant:

  1. Compile the entire Windows version of the system in the presence of the Windows headers. For those of you that are familiar with this notion, you can imagine the deleterious effect on build times.
  2. Put in #defines in their root headers, for Windows builds- only, to emulate the perversions of the function names done by the Windows headers in those compilation units that include them.
  3. Create a header to be included by all Windows-specific compilation units, which #included windows.h, and then immediately added the requisite #undefs to render the system "whole" again.

For reasons of both speed and "purity of soul", Option 1 was ruled out. Option 3 was the one selected, but the team subsequently "evolved" to Option 2.

You might think that, ugly as it is, this problem is at least discoverable at compile/link time. For the networking product at that stage of its development, that was so, and any of the three options above would yield "correctness", once compile and link stages were complete and error free. But consider what happens if you're using dynamic libraries, and are loading functions explicitly by name, via dlopen()/dlsym() (UNIX) or LoadLibrary()/GetProcAddress() (Windows). Just because the preprocessor will merrily change your GetMessage() to GetMessageA() does not mean it will also examine your string literals and do the same thing. Hence, you can have lurking problems in a code-base that was thoroughly tested and working on another operating environment, and such lurkers can be extremely hard to find. That is the case for any of the three options. (The only times such problems become easy to find are when you're doing a demonstration for your boss the day before he does your salary review, or when you've shipped the product to a client that has placed exacting downtime fines on your company. :-)

Can good C-itizens still get caught?

Clearly this problem is composed of two aspects, which combine to give the killer effect. There's the need to map one name to another, and also the potential wider (than intended) name correspondence on which the mapping may act. In principle, if either of these can be obviated, the problem goes away.

In C, the macro-preprocessor is all we have, and there's no alternative for providing the name mapping, so good authors of C libraries attempt to address the second aspect, the name correspondence. This is usually addressed by prefixing the names with an appropriately unique symbol, to give "safe(r) macros". For, example, Matthew's recls library [6] — implemented in C++, but presenting a C-API — uses the prefix Recls_, as in Recls_CalcDirectorySize(). While not being a theoretical guarantee, this technique usually suffices in practice.

A Better Approach for C++

One of the basic tenets of C++, as espoused by Bjarne Stroustrup himself [7], is that the preprocessor should be, at worst, relegated to the bench, and only brought onto the pitch when facing a particularly feisty opponent. Maybe we can follow that intent a little in this case?

Many years ago, Matthew used his one-good-idea-per-year quota and applied some common sense to the problem. As many of you will know, C++ compilers are required to define the preprocessor symbol __cplusplus when processing a C++ compilation unit; in other words, when compiling a C++ source file. We can leverage this just as readily as we can the presence of UNICODE, or ACMELIB_MULTI_THREADING, or any other symbol, in order to know when we're in C or in C++. Remember, in C we must accept the status quo and merrily trample away. However, in C++ we have a better choice to macros, however unique we've attempted to make them: namespaces and inline functions.

(Note: C99 defines the inline keyword for C code, and other compilers have proprietary extensions to do the same thing, so it's possible to take the C++ approach for C, as long as your compiler supports it.)

Let's look at how this might work in practice, by rewriting our AcmeThreadingStuff.h header:

 /* AcmeThreadingStuff.h */
 
 ACMELIB_EXTERNC void TheFuncST(void);  /* This does
 single-threaded stuff */
 #ifdef ACMELIB_MULTI_THREADING_SUPPORTED
 ACMELIB_EXTERNC void TheFuncMT(void);  /* This does
 multi-threaded stuff */
 #endif /* ACMELIB_MULTI_THREADING_SUPPORTED */
 
 #ifdef __cplusplus
 
 # ifdef ACMELIB_MULTI_THREADING
 inline void TheFunc() { TheFuncMT(); }
 # else /* ? ACMELIB_MULTI_THREADING */
 inline void TheFunc() { TheFuncST(); }
 # endif /* ACMELIB_MULTI_THREADING */
    
 #else /* ? __cplusplus */
 # ifdef ACMELIB_MULTI_THREADING
 #  define TheFunc      TheFuncMT
 # else /* ? ACMELIB_MULTI_THREADING */
 #  define TheFunc      TheFuncST
 # endif /* ACMELIB_MULTI_THREADING */
 #endif /* __cplusplus */

Now, in C++ compilation, there is no TheFunc preprocessor symbol definition, there is only the bona fide function TheFunc(). This means that TheFunc() no longer trespasses over other namespaces. In our mixed — AcmeLib + BaRBSoft — example, the symbol TheFunc from the BarBSoft namespace is now thoroughly unaffected by the definition of the AcmeLib version in the global namespace.

Indeed, a future evolution of the BaRBSoft library might result in a similarly conditionally defined nature to its TheFunc, perhaps according to the ambient character encoding, as follows:

 namespace BaRBSoft
 {
 #ifdef BARBSOFT_UNICODE
 typedef wchar_t     char_type;
 #else /* ? BARBSOFT_UNICODE */
 typedef char        char_type;
 #endif /* BARBSOFT_UNICODE */
 
   int TheFuncA(char const *regId, int *regIndex);
   int TheFuncW(wchar_t const *regId, int *regIndex);
 
   inline int TheFunc(char_type const *regId, int *regIndex)
   {
 #ifdef BARBSOFT_UNICODE
     return TheFuncW(regId, regIndex);
 #else /* ? BARBSOFT_UNICODE */
     return TheFuncA(regId, regIndex);
 #endif /* BARBSOFT_UNICODE */
   }
 
 } // namespace BaRBSoft

Because we've used inline functions, rather than macros, the name mapping in the BaRBSoft namespace does not leak out and pollute any other namespace, including the global namespace within which ACMELIB's TheFunc is defined. Now we can kiss goodbye to compile errors, and missing symbols.

<<  Page 2 of 3  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us