The Artima Developer Community
Sponsored Link

Artima Developer Spotlight Forum
A Taxonomy of Error Types and Error Handling

9 replies on 1 page. Most recent reply: Jul 6, 2007 12:02 AM by Scala Enthusiast

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 9 replies on 1 page
John Bayko

Posts: 13
Nickname: tau
Registered: Mar, 2005

A Taxonomy of Error Types and Error Handling Posted: Jun 21, 2007 8:13 PM
Reply to this message Reply
Advertisement

Most of what is written about error and exception handling is fairly abstract and vague. When specific recommendations are made, those usually consist of examples given for specific circumstances. Yet, error handling is a fundamental task for developers. This brief review attempts to categorize the different types of error conditions a program can encounter. By describing these categories, I also provide suggestions on how to handle each type of error condition.

In general, errors a program can encounter tend to be the result of one of three things:

  • Restrictions: Arguments to a routine that can never work, and always result in an error, define a restriction on the use of the routine. Ensuring that only correct arguments are passed to a routine is the type of thing that programming by contract is meant to address.
  • Inconsistencies: When values or resources are not what they are expected to be, or are missing, that creates an inconsistency between the expected state of the environment and the actual state. This may be the internal environment, such as a null pointer, or the external environment, such as a corrupt file. It doesn't encompass inconsistencies in the data model, which often needs to be temporarily inconsistent during an operation (e.g. adding a node to a linked list).
  • Failures When an operation simply does not work, and it's out of the program's control, this is a failure. For example, a pulled network cable.

These types of errors overlap to an extent (say, a working network could be considered part of the expected state, making it an inconsistency error). In general, though, most errors can fall into one of these categories.

Sometimes failures are not errors, and are just a way of detecting the current external state. For example, opening a file that doesn't exist may fail, but results in an error only when that file is actually needed.

Error Handling Responsibilities

Program code is responsible for the consistency of the internal program state. Generally certain code has primary (ideally, exclusive) responsibility for parts of the internal state. Inconsistency errors that occur within the code responsible for that state are bugs.

Sometimes the state management responsibility is shared between different sections of code. This is a bad idea, because it makes assigning responsibility for an inconsistency error harder, but it does happen in practice.

It's important to make a distinction between error detection and debugging. Often, data generated in the process of error handling is mixed together with diagnostic information. If possible, these types of information should be kept completely separate—at least conceptually, even if combined in a single data structure.

Safe Zones

Restrictions can be checked before calling a routine, or within a routine. It seems a waste of time to check arguments every time a routine is called when you already know those arguments are correct. One strategy is to separate parameter checking from parameter usage. This doesn't work reliably for library code, where anything can happen between the check and the use of the parameters, but within a base of code for a particular application or within a library, you can restrict the code to not change a value known to be safe.

The code between a parameter check and the next change to a parameter variable is a safe zone, where parameters don't have to be re-checked. This is only valid for restriction errors, because inconsistency and failure errors can be caused by things outside the code's safe zone. Things like critical sections (in multithreaded environments), semaphores and file locks are meant to create a very limited kind of safe zone for inconsistency and failure errors.

The code safe zones for parameters can overlap with others, and may not be well defined. One way to deal with this is to assign known safe values to variables which indicate this safety. Joel Spolsky wrote about one way to do this using variable naming conventions in Making Wrong Code Look Wrong. Safe values should be assigned to variables declared constant.

Reporting Errors

Code calling a routine needs to know three things to decide how to proceed: First, whether the data is returned, if any, or if the method invocation succeeded; second, whether an error occurred; and, third, whether the error is permanent or transitory. This defines the following possible error states returned from a routine:

  1. Successful
  2. Restriction error (always permanent)
  3. Permanent (bug) inconsistency
  4. Transitory (detected) inconsistency
  5. Failure (transitory for all we know)

It's often a bad idea to mix an error code with a return value, such as designating a specific values—say, 0 or -1—to be invalid. Some languages, like Python, allow multiple values to be returned from a method as tuples. A tuple is basically an anonymous class, and can be implemented in a language like Java by defining a class for objects returned by a method, or in C by defining a struct which is passed as a parameter and is updated by the function. But in many cases, exceptions are a much better way to separate error information from return values.

Exceptions transmit an object from the exception location to a handler in a scope surrounding it, or surrounding the point where the routine was called. The exception objects include information by the object type and class, and debugging information the data contained within that type. Exceptions by themselves don't indicate the error state, so that must be included as an attribute of the exception object, or the error state must be deduced from the debugging information (object type and data).

Java introduced the controversial notion of checked exceptions, which must either be caught or declared to be thrown by a method in order to compile, while unchecked (or runtime) exceptions behave like exceptions in other languages. The main cause of the controversy is that there has been no good definition of why there should be a difference and, as a result, no consistent strategy in the implementation in various libraries, including standard parts of the different Java runtime libraries.

In general, unchecked exceptions are meant for bugs, where an error indicates that the code is simply wrong and must be fixed (restriction and bug inconsistency errors). An example is a NullPointerException. Checked exceptions are for detected inconsistency and failure errors, where the program may have a strategy of handling the error. An example is an I/O error.

Transactional Operations

One strategy to handle errors is to make all operations transactional, so that if they fail, it's as if the operation was never tried. One way implement this is to define an "undo" operation for every change:

Ideal transactional function

In this example, the functions are also transactional, and thus don't need to be rolled back if they fail. This can be done with nested if/else blocks, or with nested try/catch blocks. If the "undo" operations themselves have errors, the result looks more like this:

Realistic transactional function

One way of dealing with this is to modify a copy of the program state, and if all operations succeed, only then commit the changes. The commit may fail, but this isolates the possible state changing errors to one point, and is similar how databases implement transactions. Another way to implement transactional operations is to make a copy of before any state is changed, and use that copy to restore the expected state, in case of an error.

In summary, having a clear taxonomy of error conditions that code may encounter helps develop better strategies for dealing with, and possibly recovering from, those errors.


David Gladfelter

Posts: 1
Nickname: tattva
Registered: Mar, 2006

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 22, 2007 11:00 AM
Reply to this message Reply
A deep analysis of error handling and exception safety can be found at http://www.boost.org/more/generic_exception_safety.html. Understanding your goal in error handling for any operation is the first and necessary step to designing that operation properly. Frome the Abrahams article:

* The basic guarantee: that the invariants of the component are preserved, and no resources are leaked.
* The strong guarantee: that the operation has either completed successfully or thrown an exception, leaving the program state exactly as it was before the operation started.
* The no-throw guarantee: that the operation will not throw an exception.

Your article talked mostly about the strong guaranty, with some interesting stuff about dealing with the states of multiple objects, but there is a place for the no-throw and the basic guarantee in any non-trivial application. For example, a Release()-type method, that frees a resource, should always be no-throw. If you can't rely on cleanup code to actually clean up, then it is not possible to write exception-safe code at all. The basic guarantee is necessary for just about any operation that interacts with serial I/O or anything other than thread-safe random-access memory. You can't in general roll back serial operations.

The Abrahams article was written from the perspective of a class library designer in C++, but the concepts are broadly applicable. I personally found it tremendously enlightening and useful.

Dave

Raoul Duke

Posts: 127
Nickname: raoulduke
Registered: Apr, 2006

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 22, 2007 3:54 PM
Reply to this message Reply
I don't have anything to add, other than to say thanks for starting this topic and posting your information; this is something I've been thinking about on and off over the years, and would really like to understand better. (I wish all developers would ;-)

Werner Schulz

Posts: 18
Nickname: werner
Registered: May, 2005

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 22, 2007 8:51 PM
Reply to this message Reply
Also worthwhile to read the recent articles by Rebecca Wirfs-Brock in IEEE Software (copies available at http://www.wirfs-brock.com/Resources.html#IEEE%20Design%20Column).

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 23, 2007 12:46 AM
Reply to this message Reply
When reporting a restriction error, why is it always permanent? You did not provide an argument why this error state is always permanent. I think always is a bit too much, especially since, as you say, "these types of errors overlap to an extent". I might be nitpicky here, so let's move on to non-nitpicky stuff:

One strategy to handle errors is to make all operations transactional, so that if they fail, it's as if the operation was never tried. One way implement this is to define an "undo" operation for every change:

When I read the introduction to your article, I was under the impression you believed it was unnecessary to treat all errors the same way. While "let's make all operations transactional" certainly isn't your thesis, you do not say anything about why you shouldn't make all operations transactional. Also, what does it really mean to treat a failed operation "as if the operation was never tried"? I feel this is the hard question.

Come to that, in my experience, one of the greatest fouls Java textbook writers make is poorly explaining the convenience of the finally clause. They give more attention to detail to the try-catch idiom, but lip service to the finally clause.

Finally, you say, "In general, unchecked exceptions are meant for bugs, where an error indicates that the code is simply wrong and must be fixed (restriction and bug inconsistency errors). An example is a NullPointerException." The NullPointerException is the crudest example imaginable and should be avoided at all costs as a teaching tool. NullPointerException examples are not even necessarily transferrable to other languages, because its not necessary for a language to provide Null Pointers. Really, at the implementation level, we should be concerned about whether or not something is defined and the constraints on that resource's definition. I.e., is it a Singleton? That's logistics. NullPointerExceptions are physical design, not logical design.

Most people seem to make logical design mistakes, which also seems to be the great commotion over Checked Exceptions in Java and the constant debate over their inclusion. Any feature in any programming language should factor out accidental complexity. Most imperative programming languages allow programmers to factor out type information when declaring variables.

I want a programming language that can help me desribe logistics, not tie me up in error and exception handling. I think in terms of logistics. Sometimes, when I write code in C, I want to "just say something" in code, but there is no succinct way to say it because of physical design constraints. That is a trade-off for writing code in C, and in the right circumstances, I will gladly accept that trade-off.

John Bayko

Posts: 13
Nickname: tau
Registered: Mar, 2005

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 25, 2007 10:33 AM
Reply to this message Reply
Thanks for the link. I do remember reading that, but when I was writing this I couldn't find it again and had to rely on my fuzzy memory. I'm glad someone else had the link.

John Bayko

Posts: 13
Nickname: tau
Registered: Mar, 2005

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 25, 2007 11:09 AM
Reply to this message Reply
> When reporting a restriction error, why is it always
> permanent? You did not provide an argument why this error
> state is always permanent. I think always is a bit
> too much, especially since, as you say, "these types of
> errors overlap to an extent".

By it's nature a restriction error is the result of checking parameters for validity, normally before they are used. If a parameter is invalid once, the same value will be invalid again. If a parameter is invalid because it conflicts with some other state (e.g. no elements in buffer) then that's a consistency error.

> One strategy to handle errors is to make all
> operations transactional, so that if they fail, it's as if
> the operation was never tried. One way implement this is
> to define an "undo" operation for every change:

>
> When I read the introduction to your article, I was under
> the impression you believed it was unnecessary to treat
> all errors the same way. While "let's make all operations
> transactional" certainly isn't your thesis, you do not say
> anything about why you shouldn't make all operations
> transactional. Also, what does it really mean to treat a
> failed operation "as if the operation was never tried"? I
> feel this is the hard question.

Ultimately, you want your software to be safe to use in the widest range of situations, and this means that it will not destroy your data or environment. In this sense you want all errors to be handled at some level in a way that ensures that they do no harm. However this may be at a program level rather than a subroutine level, which I guess is my real point - there are less fine-grained ways of preserving safety if it's more convenient.

> Finally, you say, "In general, unchecked exceptions are
> meant for bugs, where an error indicates that the code is
> simply wrong and must be fixed (restriction and bug
> inconsistency errors). An example is a
> NullPointerException." The NullPointerException is the
> crudest example imaginable and should be avoided at all
> costs as a teaching tool.

I mentioned null pointer exception because it's widely understood, and it's an error that cannot happen due to anything other than incorrect program code (providing the language throws an exception when new fails rather than returning null).

> Most people seem to make logical design mistakes, which
> also seems to be the great commotion over Checked
> Exceptions in Java and the constant debate over their
> inclusion. Any feature in any programming language should
> factor out accidental complexity. Most imperative
> programming languages allow programmers to factor out type
> information when declaring variables.

I was trying to explain checked exceptions rather than defend them, but in their defence they are part of a method's expected channel of communication to the calling code, so it's legitimate to need to declare them. Unchecked exceptions are unexpected because they are bugs, not status information and calling code should never need to know about them. An alternative for returning expected status information (as I pointed out) would be to use output parameters (not supported in Java) or multiple return parameters (like tuples in Python or Ruby, also not supported in Java). Either one of those would also have to be declared in a statically typed language.

> I want a programming language that can help me desribe
> logistics, not tie me up in error and exception handling.
> I think in terms of logistics. Sometimes, when I write
> e code in C, I want to "just say something" in code, but
> there is no succinct way to say it because of physical
> design constraints. That is a trade-off for writing code
> in C, and in the right circumstances, I will gladly accept
> that trade-off.

One way I think that can be improved is to recognize that what are all called "errors" are sometimes different things. Ideally, how to handle some types of errors could be simplified or automated by a language or library design, which would reduce the number of "errors" that the developer needs to worry about manually.

Morgan Conrad

Posts: 307
Nickname: miata71
Registered: Mar, 2006

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 25, 2007 11:21 AM
Reply to this message Reply
It's not clear to me what's the real, effective difference between a Restriction and an Inconsistency. Keeping your null example, if foo is null, and I go

foo.hashCode(); that's an inconsistency

but if I go

UtilityMethod.hashCode(foo) thats a restriction error.

Yet, to my mind, it's the same error - somebody forgot or failed to initialize foo.

John Bayko

Posts: 13
Nickname: tau
Registered: Mar, 2005

Re: A Taxonomy of Error Types and Error Handling Posted: Jun 25, 2007 2:19 PM
Reply to this message Reply
> It's not clear to me what's the real, effective difference
> between a Restriction and an Inconsistency. Keeping your
> null example, if foo is null, and I go
>
> foo.hashCode(); that's an inconsistency
>
> but if I go
>
> UtilityMethod.hashCode(foo) thats a restriction error.
>
> Yet, to my mind, it's the same error - somebody forgot or
> failed to initialize foo.

The latter is a restriction error because it's up to UtilityMethod.hashCode(foo) to decide whether null is an error or not (maybe just a no-op) - that is, the routine is placing restrictions on what it will accept. But from the view of the calling code, if it is an error, it's an inconsistency error.

The distinction isn't useful farther away from the point of the error, it's mostly useful for deciding immediately what to do next (e.g. can it be fixed and retried or not?), if you want to make the decision there.

Scala Enthusiast

Posts: 7
Nickname: sashao
Registered: Jun, 2005

Re: A Taxonomy of Error Types and Error Handling Posted: Jul 6, 2007 12:02 AM
Reply to this message Reply
Very nice and thorough article, thanks. Sorry I discovered the article late but it kinda strikes a nerve so I would like to follow up anyway, with a little bit of advice.

Exception handling is over-engineered more often than not.

Urge the temptation and keep it simple. If the exception happens, there is that handler somewhere far up the execution stack, it knows how to print stack traces and there is not much else you can do to help. Don't write long explanations nor collect tons of diagnostic information -- the logs will boomerang to you anyway and you will be helplessly looking at the code trying to guess what has happened, with our without megabytes of irrelevant diagnostic gibberish.

Same on the receiving end. Of course there are valid cases where you want to process specific type of exception or bail out in an unusual way but these happen far far more seldom than most people think (once every 50,000 lines if I have to throw a number). Your gut will tell when this is the case.

In all other cases, just relax and make sure you don't leak.

Flat View: This topic has 9 replies on 1 page
Topic: Scaling with Rich Clients Previous Topic   Next Topic Topic: Database Throughput


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2017 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us