Small Bugs with Big Implications

An Interview with IBM's Mark Thomas from JavaOne 2007

by Frank Sommers and Bill Venners
August 10, 2007

Summary
Small bugs can sometimes lead to significant failures, especially when those bugs manifest themselves in production code. In this interview with Artima, Mark Thomas, Director of Java Technologies at IBM, talks about the potentially big implications of defects that are discovered only in a production environment, and what developers can do to discover and mitigate such problems.

In spite of thorough testing and following the best development practices, bugs can still seep into production applications. Such bugs are not only very difficult to detect, but can also cause downtime with significant consequences to a business.

Artima spoke with Mark Thomas, Director of Java Technologies at IBM, about the challenges of detecting relatively small problems whose implications are magnified by a chain of interconnected businesses and customers. In the interview, Thomas also talks about IBM's tools and technologies for managing production applications.

Sometimes there could be huge downstream implications and issues that result from quite small failures.

Recently, I was in China, and this was a client that was facing very occasional system failures. Perhaps once a month, not much more often than that. But the implications, because of the customer's business and the nature of their customers' interactions, could affect tens of thousands or hundreds of thousands of people, for days. Even though the system downtime, due to the magic of high availability and failover and all those kinds of things, the system would only be down a few minutes... So these could be very big events, even though they result from very small bugs or very small issues...

The common factor that comes out of all of that is that these are bugs that are found after, what one would hope, was extensive testing. Many good developers will do proper testing, they'll have unit tests, and the company or the environment in which they work will also have integration tests, fairly good user tests, very comprehensive sets of testing. But we still find failures that occur once you go into production.

Those [failures] might be due to an environment change or might be due to some latent defect in some part of the stack. But usually the [common] characteristic is that you won't find [those defects] by single-stepping, and it will come through an application that a lot of good developers [tested] before releasing their code.

Once you're in production, nobody is going to let you run a debugger in that production environment. So it comes down to: How can you make it easy to find the root causes of those problems?

Click to download audio Mark Thomas, Director of Java Technologies at IBM, talks about small defects that can have big implications. (17 minutes 24 seconds)

What techniques have worked for you for finding bugs in code already in production?

Post your opinion in the discussion forum.

Talk back!

Have an opinion? Readers have already posted 2 comments about this article. Why not add yours?

About the authors

Frank Sommers is Editor-in-Chief of Artima Developer. He also serves as chief editor of the IEEE Technical Committee on Scalable Computing's newsletter, and is an elected member of the Jini Community's Technical Advisory Committee. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld.

Bill Venners is president of Artima, Inc. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project, whose ServiceUI API became the de facto standard way to associate user interfaces to Jini services. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community.