In the Spirit of C

Opinion

by Greg Colvin
June 21, 2004

Summary
Veteran developer Greg Colvin traces the evolution of C, C++ and Java with an eye to a better future.
In the Spirit of C Summary

In what sense can C, C++, and Java be described as sharing a spirit? At the most superficial level, they look similar. For instance, the following code fragment, placed in an appropriate context, will compute the greatest common divisor using Euclid's algorithm in all three languages:

int gcd(int m, int n) {
   while( m > 0 ) {
      if( n > m ) {
         int t = m; m = n; n = t;
      }
      m -= n;
   }
   return n;
}

But when we speak of sharing a spirit we are speaking of some essential commonality, beyond mere syntax. In search of that essence, let's turn to the Rationale for the first ANSI C standard:

The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like
  1. Trust the programmer.
  2. Don't prevent the programmer from doing what needs to be done.
  3. Keep the language small and simple.
  4. Provide only one way to do an operation.
  5. Make it fast, even if it is not guaranteed to be portable.
The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule.

In many ways the B programming language is the truest embodiment of this Spirit, and the evolution since then can be seeing as trading off among the five facets above. More on this later.

"Twenty years of schoolin' and they put you on the day shift" (Bob Dylan)

First, a little history of my own love affair with C. Back in 1983, armed with a fresh Ph.D. in psychology, I joined my mentor Peter Ossorio at an artificial intelligence startup. I was hired to do linguistic analysis, but when I arrived it rapidly became clear that the work that most needed doing was programming. The company had a contract to deliver a text analysis and retrieval system to run on the latest VAX computers, and had only a prototype system coded in FORTRAN. And not the fairly standard FORTRAN then accepted by the VMS compiler, but an older, heavily extended dialect that was no longer supported. Our resident mathematician, having debugged the prototype code when he arrived, decided it was time to recode the software in C. "Why C?" I asked. "Because", he said, laconic as ever, and gave me a copy of Kernigan & Ritchie.

We were too poor to lease time on a VAX, so we had to share a PC clone. It had 128K of RAM and two 320K floppy drives, one to hold the Lattice C compiler and libraries, the other to hold all our code, which had to run in 64K anyway, since MSDOS used the other 64K. As I recall the 8088 CPU was clocked at a screaming 4.77 MHz. Looking back it is clear that nothing but a small and simple language could have worked in that environment.

So, with K&R in hand and a C compiler on floppy, we set to work. The notational power of C quickly became clear, as I easily implemented my matrix algebra codes (real programmers can write FORTRAN in any language) and our mathematician turned his pattern matching theorems into recursive gems of dense pointer arithmetic. The efficiency of C then became obvious, as all our code fit into memory and ran faster than we had any reason to hope. When we were confident we had the code right we scheduled two weeks to port to VMS, using borrowed time on the nearby university's VAX. It took only two days, with just two bugs, caused by my thinking that ints would hold only 16 bits. Near the end of my second day at the university computer center I recall a student looking over my shoulder and asking "What's that?" "C code," said I. "What's C?" asked he. "Portable structured assembly language," said I. "Just what the world needs," said he, and I'm still not sure if he was being serious or sarcastic. But I was in love, and there was no looking back.

Necessity is a Mother (Anonymous)

So where did this remarkable tool called C come from? In 1969 Ken Thompson set out to write a FORTRAN compiler for the nascent Unix operating system being created on his PDP-7 with Dennis Ritchie, Doug McIlroy and others. As Ritchie relates:

As I recall, the intent to handle Fortran lasted about a week. What he produced instead was a definition of and a compiler for the new language B. B was much influenced by the BCPL language; other influences were Thompson's taste for spartan syntax, and the very small space into which the compiler had to fit.

Like BCPL, B was a typeless language with a rich set of operations on machine words, which could hold integers, bit patterns, characters, data addresses, and function addresses. Our GCD example is easily translated to B code:

gcd(m, n) {
   while( m > 0 ) {
      if( n > m ) {
         auto t = m; m = n; n = t;
      }
      m = m - n;
   }
   return n;
}

How well does B embody The Spirit? Almost perfectly, in my opinion. "Trust the programmer" and "Don't prevent the programmer from doing what needs to be done" are obvious characteristics of a typeless language. An 8 kilobyte space limit and Thompson's spartan tastes conspired to "Keep the language small and simple" -- the auto specifier is just about the only syntactic cruft. With so little syntax one is lucky to find even "one way to do an operation", and by operating only on native machine words there is no impediment to a fast implementation.

So has evolution since B been just a Fall from Paradise?

"But of the tree of the knowledge of good and evil, thou shalt not eat..." (Genesis 2:17)

The PDP-7 was a word-addressed machine, but the PDP-11 was byte addressed. As a result the clumsy handling of characters in B, packing and unpacking bytes to and from machine words, became an obstacle to performance. Also, the PDP-11 was promised to soon have a floating point unit, but the 16 bit machine word would not suffice to hold a floating point value. To achieve better performance Dennis Ritchie decided to bite the apple and add char and float data types to B. This marked the first compromise with Proverbs 2 and 3 of the Spirit of C, trading the simplicity of typeless programming for maximal performance.

Having bitten, Ritchie improved other aspects of B by adding user-defined struct and union types and introducing the rule that the name of an array is converted to the address of its first member when used in expressions. A syntax for declaring types was also provided, and the evolution from B to C was well under way.

The rule that int was the default type and the fact that pointer values would fit in an int type on the PDP-11 made it possible for the earliest C compilers to accept most B code, and most of Unix, originally coded in assembly and B, was rewritten in a nearly typeless style of C. So we have the first appearance of the rule that �programmers don't pay for what they don't use,� and we also see a policy of backwards-compatible changes, both of which would have a great impact on the future evolution of C and C++.

As ever, the fruit of the Tree has its price. C remained a remarkably simple language, but with the more complex type system came more opportunities for error, and trusting the programmer to avoid error became more difficult. Over time tools like lint were provided to check for possible type errors, and compilers became more strict about the code they would accept. So Proverb 1 was also compromised as more trust was placed in the tools and less in the programmer.

Were you wondering was the gamble worth the price? (Joni Mitchell)

If Dennis Ritchie bit the apple, Bjarne Stroustrup went on to become a veritable Johnny Appleseed. The C++ language is very nearly completely type-safe, which is a good thing, as its type system is arguably the most complex of any language. In addition to the fundamental and derived types of C we have references, inheritance, multiple inheritance, virtual inheritance, pure and virtual member functions, runtime type information, function templates, type templates, type deduction, and more. Proverb 3 was abandoned at the wayside, as the language gained expressive power at the expense of simplicity.

Although object-orientation was the initial motivation for extending C to C++, the most powerful extension has turned out to be the generic programming facility provided by templates. Templates were introduced to allow for type-safe containers, so that one could define a class like list<T> just once and then use it for any kind of list element. But in 1994 Erwin Unruh brought an innocent-looking little program to Santa Cruz that failed to compile, but caused the compiler to generate a sequence of prime numbers in its diagnostic output. I recall being mystified, then amused, then horrified. By introducing templates we had inadvertently added a Turing- complete meta-language to C++. At that point we could have restricted the template facility to prevent such meta-programming, but instead we took a gamble and embraced it, which cost us no end of pain as the impact of templates rumbled through the language and library.

Does type safety mean the programmer need no longer be trusted? Hardly. All the undefined behavior possible in C remains possible in C++, along with new ways to go wrong. In my experience there is almost no limit to the damage that a sufficiently ingenious fool can do with C++. But there is also almost no limit to the degree of complexity that a skillful library designer can hide behind a simple, safe, and elegant C++ interface. Generic meta-programming in particular is proving to be an amazing tool for creating simple interfaces to maximally efficient implementations of complex facilities, so in my opinion the gamble has been worth the price.

The Simple Oak

Because the circle of chaos was closing in on the realm, the hero went to the troll and, forcibly subduing him, demanded to know the secret of drawing order out of chaos. The troll replied, Give me your left eye and I'll tell you. Because the hero loved his threatened people so much, he did not hesitate. He gouged out his own left eye and gave it to the troll, who then said, The secret of order over chaos is: Watch with both eyes. (John Gardner)

Java was created in the heat of battle during the Great Browser Wars. The business requirement was for a portable language whose compiled code could be run safely on unsecured Web browsers, and whose syntax and semantics were familiar enough to be easily learned by existing programmers. The hitherto obscure Oak project, originally targeted at devices like set-top boxes for cable TV, was rechristened Java and the substantial resources of Sun Microsystems were applied to bringing Java to market. As it happened, Java had little impact as a Web browser plug-in, but found a welcome audience among engineers creating server-side systems for Web-based commerce, and among educators seeking cheap, simple tools for teaching object-oriented programming.

The design of Java emphasizes Proverbs 3 and 4, sacrificing power and speed for safety and simplicity. Not only is Java completely type-safe, there is no undefined behavior at all. This safety is enforced both by the language and by the virtual machine, so that not even malicious object code can damage the host machine. In light of this, Java has done a remarkable job of providing an expressive language that is more than adequate for most programming tasks. The exception is code which needs full control of the hardware or which must make the most efficient possible use of machine resources, including the Java virtual machine itself. Such code must still be written in a lower level language like C++, C, or assembly.

Does this admirable degree of safety mean that the programmer need not be trusted? After all, Java's automatic memory management and lack of pointer arithmetic mean that stray and dangling pointers are impossible. In a word, No, but here are three examples.

First, depending on the situation, automatic memory management can actually make it more difficult to control the memory requirements of a program. Probably the most frequent bug report against our Java implementation is that the garbage collector is leaking. In almost every case these reports are false—it is the Java code itself that is holding onto unneeded objects. And fixing the problems of unnecessary object retention can lead to performance problems, as expensive objects get recreated as needed rather than retained in memory. So various caching strategies are used to prevent the premature destruction of valuable objects, leading to such system classes as WeakReference, SoftReference, PhantomReference, ReferenceQueue, and WeakHashMap.

Second, the lack of stack-allocated objects can make it more difficult to manage resources other than memory. For instance, my first Java program rapidly ran out of file descriptors:

void scanFile(String name, Filter filter) {
   File file = new File(name);
   filter.scan(file);
}

Embarrassingly obvious, of course, but embarrassingly common as well. I soon learned to use finally blocks for lack of destructors.

Third, although undefined behavior is impossible in Java, deadlock and livelock are shamefully easy. Rather than provide deadlock-free language mechanisms for concurrent execution, Java provides only low-level synchronization primitives, leaving it to the programmer to manage contention. Even a simple monitor facility such as provided in Per Brinch Hansen's Concurrent Pascal would have been an improvement, and the thirty years since Hansen's work have seen considerable progress in safe approaches to program concurrency. But for whatever reason Java ignored this work.

The first two examples argue that it is a myth that managing the life cycle of objects is intrinsically easier in Java than in C++. And if anything Java threads are more difficult to use correctly than Unix processes. Despite these caveats, Java has proved a successful tool. It has delivered, to a surprising extent, on its promise of safe, portable programs.

Be careful what you wish for (Anonymous)

Java, as a completely new language, was able to make a much cleaner break from C than C++ could. Even so, it bought greater simplicity at a cost in power and speed that C++ cannot afford to pay. Is there is some hope that C++ can become simpler?

I believe our biggest loss in the Fall from the Grace of B was the simplicity of typeless programming. The loss was necessary, and for C the loss was manageable, but for C++ complexity has become a daunting obstacle. Much of that complexity is in support of generic programming, where advanced practitioners have pushed the original template syntax far beyond it's original purpose. Paradoxically, it may be template semantics that point the way towards a radical simplification of syntax.

The power of generic programming comes from the ability of the compiler to deduce types in context. At the limit, that ability could eliminate the need to declare types. For example, to write a version of GCD that is agnostic as to the types of its arguments we currently write something like

template<typename T>
T gcd(T m,T n){
   while( m > 0 ) {
      if( n > m )
         swap(m,n);
      m = m - n;
   }
   return n;
}
which doesn't look too bad, but gets rapidly worse as the number of types increases, and is nearly impossible if the result type depends on more than one argument type. So I would like to see the template syntax made optional, and let the compiler do the work:
gcd(m,n){
   while( m > 0 ) {
      if( n > m )
         swap(m,n);
      m = m - n;
   }
   return n;
}
Very clean and simple, just like B. And yet very safe, as the compiler can ensure that all the deduced types are compatible. [Editor's note: This is essentially how templates work in the D language!]

It is tempting here to continue on with more "wish list" items, but that's the topic of my next article.

Acknowledgement

Thanks to Angelika Langer for her comments on the first draft of this article.

Talk back!

Have an opinion? Readers have already posted 22 comments about this article. Why not add yours?

About the author

-