The Artima Developer Community
Sponsored Link

Python Buzz Forum
On software reliability

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Dmitry Dvoinikov

Posts: 253
Nickname: targeted
Registered: Mar, 2006

Dmitry Dvoinikov is a software developer who believes that common sense is the best design guide
On software reliability Posted: Nov 19, 2007 5:39 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Dmitry Dvoinikov.
Original Post: On software reliability
Feed Title: Things That Require Further Thinking
Feed URL: http://feeds.feedburner.com/ThingsThatRequireFurtherThinking
Feed Description: Once your species has evolved language, and you have learned language, [...] and you have something to say, [...] it doesn't take much time, energy and effort to say it. The hard part of course is having something interesting to say. -- Geoffrey Miller
Latest Python Buzz Posts
Latest Python Buzz Posts by Dmitry Dvoinikov
Latest Posts From Things That Require Further Thinking

Advertisement
Despite the common prejudice and the name, for a system to be reliable is not the same as to have no errors or never crash (noone should ever be promising that). The way I see it, reliabilty is more of a predictability. A system is reliable if it behaves in a predictable fashion - so much that you can rely on that. Even if all it does reliably is crashing.

The systems we build, they don't exist in isolation. Again, despite a popular myth, programmers don't pull things out of a thin air. We base our work on the work of others - hardware, operating systems, servers, frameworks, libraries, compilers, you name it. Reuse is the boon and the bane of the industry. Client-server pairs, interfaces and contracts are not just about OOP, they are everywhere. Any interaction between software components is about grabbing something somebody else has.

I therefore take it for granted, that there are parts of the system not written by us or not belonging to us. This implies that they cannot be controlled in any way (actually, I mostly work on integration middleware, which makes my experience worse). Then this inability to control leads to inability to fix. More often than not, you cannot fix problems in systems you have to depend on.

What do you do when you encounter an unexpected behaviour from somebody else's system you have to depend on ? I do this: first - fetch a cookie for being lucky, second - understand the cause, third - find a workaround.

The point here is - a problem known is not a problem. Because if you know the exact circumstances under which it hits, you can work around. Figuratively, you have to walk the minefield, but every previously found mine can be sidestepped. Therefore,

For a component to be reliable, you must be able to work around any problem encountered, and for a system to be reliable, you need to know all the problems it can cause.

To conclude, here is a few examples of something which is broken and reliable at the same time, because there is a workaround:
  • Reliable is a broken CPU which always fails upon certain floating point division. The numeric libraries fall back to software emulation when encountering this particular kind of chip.
  • Reliable is the XML parsing library which crashes when you attempt to set attribute value to something with letter "Ё" in it. I replace "Ё" with "Е", which is a good enough if slightly ambiguous substitute.
  • Reliable is the compiler which chokes and dies upon too complex a template. I rearrange the angle brackets and it goes on fine.
  • Reliable is the provider's server which crashes every other Friday at 13:00. I schedule an offline gap at that time and noone complains.
  • Moreover, reliable is the provider's server which crashes sporadically, but will gracefully handle repeated access attempts. As you might guess, this last example is the warmest to my heart and is one of the bases for Pythomnic - the platform for developing network services I'm developing and using.

Read: On software reliability

Topic: Another ridiculous sentence Previous Topic   Next Topic Topic: November’s been a crazy month for JumpBox

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use