Artima Developer Spotlight Forum - The Whole is Less Reliable Than the Parts

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Artima Developer Spotlight Forum
The Whole is Less Reliable Than the Parts

10 replies on 1 page. Most recent reply: Jun 5, 2006 7:41 AM by Achilleas Margaritis

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 10 replies on 1 page

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

The Whole is Less Reliable Than the Parts

Posted: May 30, 2006 9:34 AM

An article in today's Wall Street Journal, Flight Check: Incidents Prompt New Scrutiny of Airplane Software Glitches (subscription required), discusses the difficulty of testing the increasingly complex software in aircraft:

With well over five million lines of code used on the latest jetliners, versus fewer than a million on older planes, it's increasingly difficult to detect and fix embedded problems before they surprise pilots.

The article contrasts the discipline of material science, responsible for ensuring the safety of materials, to the less well-understood task of creating very complex software in a fault-tolerant manner:

Mechanical components such as jet engines are just as complex in their own way as computers. But aviation engineers now have an exhaustive understanding of the physical properties of metals, plastics and other materials and they know how to test them together as a system. That helps the industry produce parts that can handle the stresses of wind, turbulence and landing. Such parts almost never fail so long as they're properly maintained and operated.

However, engineers can't predict as easily what kind of stresses might cause a computer program to go haywire. "Software is different," says Gérard Ladier, the senior manager of software engineering at Airbus.

Part of the problem stems not from bugs in individual software modules, but from the subtle interaction of many such modules in an airplane:

Specialists say the biggest problems in aviation software don't stem from bugs in the code of a single program but rather from the interaction between two different parts of a plane's computer system. In extreme cases, foul-ups can lead to sudden loss of control, sometimes not showing up until years after aircraft are introduced into service.

One such example, quoted in the article, was a Malaysian Airlines flight that completely went haywire for about 45 seconds, not even giving pilots a chance to override the automated flight control systems. The incident was caused by the way messages were interpreted from a recently upgraded flight control component:

Boeing's 777 jets started service in 1995 and had never experienced a similar emergency before. According to Boeing and Honeywell, the source of the problem was a revised computer program that had recently been installed on all 777s to fix a minor navigation flaw.

As this example illustrates, the problem with a complex system is that its multiple components evolve independently, subtly altering the behavior of the system as a whole. What are your approaches to creating reliability on complex systems that consist of many independently evolving parts?

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: The Whole is Less Reliable Than the Parts

Posted: May 31, 2006 2:15 AM

Software reliability will not come of age until the software industry realizes that values are types.

Inconsistency in a program is simply caused when a set of instructions is invoked with the wrong set of values.

Furthermore, inconsistency is 'encouraged' by using the wrong programming model. What most applications need is an change-driven model where code is invoked as a result of a state change. What most applications get is a poor attempt using object orientation or functional programming, with mediocre results.

Harrison Ainsworth

Posts: 57
Nickname: hxa7241
Registered: Apr, 2005

Re: types

Posted: May 31, 2006 3:10 PM

> Software reliability will not come of age until the
> software industry realizes that values are types.
>
> Inconsistency in a program is simply caused when a set of
> instructions is invoked with the wrong set of values.

This assumes that the ranges or constraints of types are always easy to define. But that is not so.

Also, faults can stem from incorrect translation of requirements. The form of expression of the software is not involved.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: types

Posted: Jun 1, 2006 1:52 AM

> This assumes that the ranges or constraints of types are
> always easy to define. But that is not so.
> Also, faults can stem from incorrect translation of
> requirements. The form of expression of the software is
> not involved.

Of course, it goes without saying. And the halting problem has not been solved yet. But a better job can be done, methinks.

Johan Bergens

Posts: 2
Nickname: jbergens
Registered: Nov, 2005

Re: types

Posted: Jun 1, 2006 4:20 AM

> Software reliability will not come of age until the software industry realizes that values are types.
>
> Inconsistency in a program is simply caused when a set of instructions is invoked with the wrong set of values.
>
> Furthermore, inconsistency is 'encouraged' by using the wrong programming model. What most applications need is an change-driven model where code is invoked as a result of a state change.
> What most applications get is a poor attempt using object orientation or functional programming, with mediocre results.

I think this kind of systems need to work more like Bertrand Meyer proposed in "Object-Oriented Software Construction", with Design by Contract.

Read about the book at http://archive.eiffel.com/doc/oosc/
Read about Design by Contract at http://archive.eiffel.com/doc/manuals/technology/contract/

The contracts should of course also be used for API:s between components in a system. And I realise that it may sometimes be hard to know which values are ok, or to specify time dependant rules, but it is a very good start that seems to have been forgotten by many people and projects in the IT industry.

Charles Haws

Posts: 24
Nickname: hawscs
Registered: Nov, 2003

Re: types

Posted: Jun 1, 2006 5:04 AM

> The contracts should of course also be used for API:s
> between components in a system. And I realise that it may
> sometimes be hard to know which values are ok, or to
> specify time dependant rules, but it is a very good start
> that seems to have been forgotten by many people and
> projects in the IT industry.

Oh, I don't think it's been forgotten. I've seen plenty of references out there. On the other hand, I think it's largely been superceded by Test-Driven Development.

Anyway, I think this article demonstrates exactly what's wrong with Test-Driven Development as it's practiced today. The TDD focus is always on unit testing, rather than acceptance testing. Lip service is always paid to acceptance testing, but the focus is *always* on the unit tests. (Unit tests are a good thing, of course. Just incomplete.)

We must find ways to test the whole system, not just individual parts.

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Re: types

Posted: Jun 1, 2006 6:49 AM

> Anyway, I think this article demonstrates exactly what's
> wrong with Test-Driven Development as it's practiced
> today. The TDD focus is always on unit testing, rather
> than acceptance testing. Lip service is always paid to
> acceptance testing, but the focus is *always* on the unit
> tests. (Unit tests are a good thing, of course. Just
> incomplete.)
>
> We must find ways to test the whole system, not just
> individual parts.

One point of the article, as I understood it, was that some systems are just inherently hard to test, especially when the system has a long life. An aircraft may be in service for several decades, and during that time its on-board systems are upgraded independently of each other. The example they mentioned was actually a minor bugfix upgrade to one system that caused a problem with another system.

Even if the contracts are well-specified, and I'd imagine they are in an aircraft system, testing the whole system is still hard.

Another point they brought up is that such bugs are very rare - in fact, they pointed out just how safe air travel has become, partly as a result of better avionics in planes. The problems that do crop up from time to time are very hard to reproduce. The problem is that when such bugs do manifest, they often result in spectacular system failures.

Jeff Ratcliff

Posts: 242
Nickname: jr1
Registered: Feb, 2006

Re: types

Posted: Jun 1, 2006 9:37 AM

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: types

Posted: Jun 1, 2006 1:35 PM

/* One point of the article, as I understood it, was that some systems are just inherently hard to test, especially when the system has a long life. An aircraft may be in service for several decades, and during that time its on-board systems are upgraded independently of each other. The example they mentioned was actually a minor bugfix upgrade to one system that caused a problem with another system.
*/

Ah, they just need to use my particular silver bullet to solve their problems.

nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: The Whole is Less Reliable Than the Parts

Posted: Jun 2, 2006 1:06 PM

When comparing software engineering with electronic engineering during a certain class I was taking, we found that individual electronic components are not failsafe. Capacitors for instance don’t have a mechanism to protect them from over-voltage. You send too much voltage in and it blows up. The same with parameters like temperature, humidity, polarity, current, tolerance etc. The individual components are rated for certain parameters; if you use them outside of those you most likely will break them.

The question then is: How do electronic engineers build reliable circuits out of cheap unreliable components? We concluded that those components are well known and documented and have not changed over the years. The experienced electronic engineer knows what parts can be combined safely.

So maybe you can build reliable systems from unreliable parts and also build unreliable systems from reliable parts. Maybe the key is not if a part is reliable or not but to be aware of its limitations.

Unfortunately in software engineering most parts are custom and in constant flux and documentation is very tricky. Too little specification and you will miss some situation. Too much specification and it will start sounding like the instructions to the holy hand grenade of Antioch. I think the most practical solution might be pretty printing summary results of automated tests, from unit all the way to system tests. It should be kind of a “white list” of parameters. That way I can say: as long as you run these parts with these parameters these are the known results, if not, you are on your own.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: types

Posted: Jun 5, 2006 7:41 AM

> I think this kind of systems need to work more like
> Bertrand Meyer proposed in "Object-Oriented Software
> Construction", with Design by Contract.

Agreed, but I believe most contracts can be statically verified.

Flat View: This topic has 10 replies on 1 page

Previous Topic

Next Topic