Re: The Bar is Higher Now
Posted: Apr 5, 2004 4:17 PM
> Another result of the unforgiving nature of the physical
> world is that it is widely understood and accepted that it
> takes time, skill and patience to become a good
> electronics engineer. You don't get 1st year graduate
> students building critical parts of electronic systems,
> they are mentored and supervised very closely. The same,
> sadly, cannot be said of software.
What I see is that a lot of people are deployed into critical systems development without sound engineering education. You're suggesting that it's just practice that makes perfect. In fact nothing can make a human's actions perfect. We are imperfect. So, what software design has to include is this basic premise. You have to design into the system that the software might be wrong.
The space shuttle system uses voting for control to make sure that everyone agrees. This can be largely considered a hardware validation, but it works for software created error conditions as well. Typically, we think that in such a system that the fact that the software versions are all the same that the same decision would be made. It is not always the case that the software environment will always be sane.
The telephone switching systems continually deal with environmental issues that the developers could not see in their lab environments. So, the software is designed to be able to retreat to a safe spot so that it can continue to function without dehabilitating the entire switch. There are levels of escalation that allow more severe measures to be taken, including the switching of hardware configurations etc.
In todays general purpose servers and desktops, there is no duplicate hardware or other things to switch to. So, all of these hardware management functions are missing in all but the most sophisticated Multi-Processor systems. We also don't usually consider software errors, except in places where the user might lose files they are manipulating.
The development of large scale, for money web services have made this issue more visible to the the masses that haven't had to deal with it. The financial sector, the medical industry and other places were human life and welfare is at risk have been dealing with these issues.
The collapse of the telecommunications industry, and in particular AT&T and Lucent's continued hard times, have put many people on the Market that do know about that industries practices to deal with software reliability.
I'd bet that we'll slowly see better software design over the next decade as old systems are put to rest.
Java provides an opportunity, with checked exceptions to make it apparent to the users of an API of exactly which things might cause the system to stop functioning normally. But, one must remember that RuntimeException's can still occur anywhere, at anytime, and thus you really need to understand how Exception's work in Java (and other languages that support them), and you need to write code that always provides the correct protection of its own lifecycle so that the user can depend on making progress with the application in any case where it would be possible to proceed if the user was in explicit control of every language statement executed.
This can add considerable complexity. But, you can also plan on it up front, and design failure modes into the applications architecture that greatly simplify the impact on the application.