Sponsored Link •
Bertrand Meyer talks with Bill Venners about strategies for dealing with failure, where to check preconditions, and when it's appropriate to design for reuse.
Bertrand Meyer is a software pioneer whose activities have spanned both the academic and business worlds. He is currently the Chair of Software Engineering at ETH, the Swiss Institute of Technology. He is the author of numerous papers and many books, including the classic Object-Oriented Software Construction (Prentice Hall, 1994, 2000). In 1985, he founded Interactive Software Engineering, Inc., now called Eiffel Software, Inc., a company which offers Eiffel-based software tools, training, and consulting.
On September 28, 2003, Bill Venners conducted a phone interview with Bertrand Meyer. In this interview, which is being published in multiple installments on Artima.com, Meyer gives insights into many software-related topics, including quality, complexity, design by contract, and test-driven development.
Bill Venners: The aim of test-driven development is to help programmers avoid bugs, to get systems that are robust. But one aspect of a system that is not just robust, but also reliable, is that when things do go wrong, either because of a bug or because of circumstances outside of the system, the system can deal with the problem without requiring an administrator or user to solve the problem. How can we create systems that deal with failure autonomously so that humans don't have to step in? We have the tool of exceptions, but what do we do with them? And what do we do when a contract assertion is false?
Bertrand Meyer: The really deep and final answer is it depends. There are really two approaches. One approach is to say, this problem simply shouldn't happen. If it ever does happen, the best you can do is shut your system down, fix the bug, and restart it. Some people take this approach, but it is probably not sustainable for telephone system. If you're AT&T and you're handling millions of telephone calls in your system and suddenly an invariant is violated, you're not going to shut off the AT&T network. For other kinds of systems, however, it is probably the most reasonable thing to do. Some problem has not been caught in debugging. It should have never happened, so they just stop the whole system, correct the defect, and restart. That is one approach. It's rather extreme. The other approach is to do essentially fault-tolerant computing. Not hardware fault tolerance, but software fault tolerance, which is a relatively new option. The term has been around a long time, but the approach hasn't been practiced that much.
The first thing to do if you have a problem is obviously to log it. Typically when you have an assertion violation during operation, not during debugging, but during operation, it's an indication that something needs to be fixed in the software. The defect cannot be left to stand in the long term. What you do in the short term is try to recover as reasonably as you can. And indeed, that's where exception handling comes in. The approach I suggest for exception handling is more low profile than the approach that seems to have become popular these days. In most recent programming languages, exceptions are a normal part of life. For example, exceptions figure prominently in the specification of operations. The exception handling strategy that I've pushed for is more low profile in the sense that it views exceptions really as what happens when everything else has failed, and you don't have much of a clue as to what is going on except that something is seriously wrong. The best you can really do is to try to restart in a clean state.
The exception mechanism in Eiffel is quite different from those that exist in other languages, and it surprises many people. When you have an exception in Eiffel, you only have two ways of reacting to it. The reason that you only have two ways is that exceptions are something that you don't want to see happening. So it's quite different from the approach that says exceptions are special cases that we are going to expect and process in a special way. In Eiffel, exceptions are really a sign that something quite wrong has happened, so you have only these two ways of reacting. One is to accept that you can not do anything better, and to just pass on the problem up to the next routine up the call chain, which of course will be faced with the same dilemma. That is often, especially for operations deep down in the call chain, the only real world reaction, because the operation does not have enough context to do anything smarter. The other reaction is, if you actually do have some elements that enable you to think you're smarter, to attempt to fix the condition that led to the exception and try the same operation again after correcting the context. This is the kind of mechanism that provides direct support for what I was calling software fault tolerance, and I think it can be quite effective provided it's used with reason, that is to say, not as another algorithmic mechanism, but as a mechanism of last resort.