The Artima Developer Community
Sponsored Link

The C++ Source
Introducing the Catenator
by Adam Sanitt
September 30, 2005

<<  Page 3 of 3

Advertisement

Don't Go Home Yet

If only it were that simple. It's clear that the Catenator can never be a fully-STL compliant container. As it doesn't actually contain its contents, it can't give a useful response to the address-of operator for any of its members [9]. It is an extreme example of a proxied container, such as vector<bool>. Rather than storing its contents on disk, in a compressed format, or in a cache, it doesn't store its contents at all. It's non-compliance isn't an artefact of inadequate design -- it's a fundamental part of what makes the container work. Even if you could work around the address-of operator (using, for instance, a proxy reference class), you shouldn't. As each item in the Catenator is a combination of sub-parts that are also used to make other items, it doesn't make sense to try and change an individual item.

Do you think of vectors, deques and the other STL containers as arrays on steroids? You were probably taught it that way. You may have been told that officially these containers have great freedom in how they are implemented internally, that an iterator is free to be a big, grown-up class, and certainly not just a humble pointer, and that you shouldn't rely on internal implementation details [10]. But the array paradigm is just so hard to resist.....

This isn't necessarily bad. It breaks encapsulation, in that you are making an assumption about the internal implementation of a container, rather than relying on the bare information in its external interface, but it does allow you to make use of your intuitive understanding of arrays and pointers. It is an example of structural conformance -- the container presents a particular interface that we recognise from other contexts. It looks and smells like an array—and we already understand arrays. Provided that it performs in the way people think it should, nothing goes wrong. This isn't really a software design issue -- it is more a matter of philosophy as to the relative priorities of strict encapsulation and structural conformance. How useful is it to pander to people's expectations, if it might potentially lead them astray?

If all proxied containers were as redundant as vector<bool> [11], and all useful containers were as STL-compliant as vector<string>, we would demand absolute structural conformance and sweep aside the very occasional corner-case. We would not countenance the risk of anyone being deceived by a familiar interface. However, the Catenator is a perfect counter-example. It is intuitively a container with some structural conformance to a vector, but if you think of it as just like a vector, or an array on steroids, you might occasionally be led astray. It is up to the programmer to remember that he is dealing with impossibly large lists and to act accordingly. You could instead provide a completely bespoke interface:

Catenator<std::string> catenator1(vectorstring);
for (std::size_t i = 0; i < catenator1.how_many_items_there_are(); ++i) {
    std::cout << catenator1.provide_checked_data_at_index(i);
}
But the cost in terms of reuse is surely too high. Most of the time, structural conformance is too useful to be ignored. Reconciliation is not a matter of rules but of broad principles. Here are a couple you might like to try out:

Principle 1

Where you provide an operation that appears to follow well-known semantics, it should, both in terms of result and cost.

So catenator.size() should tell you how many items there are in the catenator and it should do it cheaply and not, say, by iterating through all the underlying items. Time to burn all those operator[] implementations that use a linked list.

Principle 2

Where you depart from the well-known semantics, any attempt to follow those semantics should fail to compile

For instance, in the Catenator class we provide an operator[]. The benefits of providing array-style access are balanced by a responsibility to prevent the programmer from misusing it. So a naive implementation might look like this:

value_type operator[](std::size_t index);
But this is nasty to any hapless programmer. The following code will compile and run flawlessly:
value_type t = catenator[1];
catenator[2] = t;
The first line is blameless. The second line will faithfully copy t into an unnamed temporary before casting it into the abyss, along with your career. One simple solution is to define operator[] as:
const value_type operator[](std::size_t index) const; 
And this is the solution that is actually adopted within the Catenator. If you want finer control over the uses of the result of operator[], replace the return type with a nested class within Catenator:
class Catenator { 
....
    class ArrayReturn {
        value_type t_;
    public:
        ArrayReturn(value_type tt) : t_(tt) { }
        value_type& operator=(value_type tt) { 
            //put lvalue use here
        }
        operator value_type( ) {
            //put rvalue use here, eg
            return t_;
        }
    };
    ArrayReturn operator[](std::size_t index) {
        ...
        return ArrayReturn(result);
    }
};
But note that (unless implemented as a bolt-in class inheriting from value_type) this will not allow constructions such as:
catenator[1].const_mem_function();

Policy and Traits

Policy classes and traits classes are two of the boons of modern programming, efficiently expressed through templates. Policy allows classes to be templatized on different algorithms; traits allow classes to describe their behaviour to other classes. The Catenator class demonstrates that these two concepts are not as different as they appear. The boundary is not set by software science or programming logic, but by programming philosophy or even programming taste.

The Catenator class is templatized on the underlying item together with a set of traits describing how to manipulate that item. For instance, the length of an item, reversing the item, cutting the last part of the item and inserting new bits in the item are all traits of the item. Another one of these manipulations allows the creation of new empty items:

template<typename T> CatenatorTraits {
    ....
    static T createempty() { return T(); }
    ....
};

This trait is included to allow the Catenator to contain items that do not have a default constructor—whether a class can be default constructed or requires some other method of creation is surely a trait of that class.

But now consider a customised container aimed at improving the memory handling characteristics of existing containers. It would still be templatized on the contained item and also on a set of traits for that item. But the allocation procedure for new items would be a key policy decision for that container:

template<
    typename T, 
    typename Tr = MemTraits<T>, 
    typename M = MemAllocator<T>
    > MemContainer {
    ....
};
So how has the same operation transformed itself from a trait carried along with the underlying item into a policy of the containing class? Clearly, the distinction depends on the purpose of the container. There is no objective difference. This is a continuation of the philosophy of C++: providing programmers with sufficient flexibility to extend the language as they see fit. To determine whether an operation is a policy or a trait, you must first know the purpose of the enclosing class: a shift in perspective is all it takes to transform one into the other.

There is another example of this in the Catenator. The search routines provide scope for the use of wildcards or variable matches, such as matching upper and lower case. Is the way you match the underlying item a trait of that item or a policy of the Catenator? For instance, considering upper and lower case matching, are the strings themselves case-insensitive or is the Catenator simply choosing to ignore case when comparing the strings? This is a rare example where it can be argued either way. The actual Catenator implementation makes it a trait rather than a policy of the Catenator. This avoids adding another template argument to the Catenator itself (or using virtual functions to vary the matching operation). It also allows all the variation in behaviour depending on the member class to be determined in a single place (the traits class) and at compile time. This decision is based, once again, on how the Catenator is to be used in practice. It is assumed that many different types might be contained within a Catenator and that how you match those types is unlikely to vary much. If Catenators contained only one or two types, but a wide variety of behaviour in how those types matched was required, then it might be more efficient—and require less typing—if the matching algorithm were implemented instead as a separate policy.

Conclusion

The benefits of structural conformance are highly tempting to a new container seeking to make its mark in the world. It is up to the original designer to prevent clients from slipping over semantic banana skins caused by structural conformance; but it is up to the client to learn what the new container is and to use it sensibly. Policy and traits provide flexibility and extensibility in ostensibly different ways, but sometimes the distinction between them arises only from the purpose of the containing class—it is simply a matter of perspective.

C++ is variously applauded or criticised for leaving the programmer in charge. It does not take design decisions for you. The Catenator class, apart from being a useful solution to a set of common problems, shows when you have to take these decisions yourself. Those decisions are not always determined by logic, but also by philosophy, good taste and etiquette.

Acknowledgements

I would like to thank the Editorial Board for their perceptive comments and my wife, Julia, and sons, Ethan and Leo, for their love and support.

Notes and References

  1. If we were simply dealing with a static list of completely arbitrary strings, then a directed acyclic word graph ('dawg') or trie would be the answer, see The World's Fastest Scrabble Programme, Appel & Jacobson, Comm. ACM v.31 n. 5, pp. 572-578 (May 1988). But here, we are generating the final list of strings dynamically and we are generating them from combinations of lists of sub-strings, so a dawg is neither necessary nor appropriate.
  2. For cryptic crossword enthusiasts—or for the merely curious—the synonym for 'spring' is 'well' which is placed inside 'sing', the synonym for 'voice', to give 'swelling', a synonym for 'growth'.
  3. The matching algorithm will perform a maximum of 3 * 10 * 20 comparisons and, subject to certain assumptions, achieves O(n) performance. See the Appendix for more details.
  4. A problem deferred is a problem solved, as any decent tax accountant will tell you.
  5. This is such an ingrained habit that the best examples are so obvious that they barely seem like examples at all. For instance, numbers are commonly represented as strings of binary digits corresponding to the numerical value of the number. An alternative representation would be as a series of ASCII characters corresponding to the number. While this would make printing out the number easier, algorithms for, say, adding and subtracting numbers would become far more difficult.
  6. See the Appendix for more details of complexity analysis.
  7. In pattern terminology, this part of the Catenator instantiates the View pattern.
  8. For further discussion of complexity of the algorithms used in the Catenator, see the Appendix.
  9. Table 68 in section 23.1.1. of the C++ Standard requires operator[] to return a member of type reference. In Table 32 in section 20.1.5., reference is required to be a T&. As the elements of the Catenator returned by operator[] are constructed when needed and not stored anywhere, they don't have an address and so can't provide a T&.
  10. The original C++ standard did not require a vector to provide contiguous storage for its members, although it was universally implemented in this way. This was amended by the 2003 standard, which did make contiguous storage a requirement. This is a rare example of the structural conformance tail wagging the semantic conformance dog. The array-like interface of the vector eventually leaked into the formal description of its implementation. However, the deque and other STL containers are still not restricted in this way: their storage need not be, and is generally not, contiguous.
  11. See Item 6: "Containers, Pointers and Containers That Aren't" in More Exceptional C++ by Herb Sutter for a discussion of the non-compliance with the C++ Standard of vector<bool>, together with a list of its other shortcomings.

Appendix

Click here for the Appendix.

About the Author

Adam Sanitt works for international law firm Slaughter and May, combining roles as a lawyer and a software developer. He has compiled cryptic crosswords for the British newspapers The Financial Times and The Times. The latest incarnation of his cryptic crossword solving software can be found at www.solvecrosswords.com.

<<  Page 3 of 3


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us