The Artima Developer Community
Sponsored Link

Weblogs Forum
Is Complete Test Coverage Desirable - or Even Attainable?

21 replies. Most recent reply: Feb 20, 2005 8:55 PM by Curt Sampson

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a flat view of this topic  Flat View
Previous Topic   Next Topic
Threaded View: This topic has 21 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Is Complete Test Coverage Desirable - or Even Attainable? (View in Weblogs) Posted: Feb 15, 2005 5:41 PM
Reply to this message Reply
Summary
Testing code is different from testing a system. Code in real-world, production systems has to contend with an ever changing, often unpredictable environment that renders unit tests an unreliable predictor of system behavior. In the real world, system robustness matters, but writing more tests can produce diminishing returns. Do unit tests instill in us a false sense of certainty?

That's how I felt the other night. Bill Venners (Artima's chief editor) and I were helping a group of developers at a Silicon Valley Patterns Group meeting. Our goal that evening was to build a Jini and JavaSpaces compute grid. Before everyone could go to work on grid code, we needed to get Jini lookup services up and running on an impromptu wireless network assembled at the back room of Hobee's restaurant in Cupertino.

The Jini Starter Kit, which Sun will soon open-source, is as high quality and throughly tested a piece of code as code gets. Indeed, the JSK is used to run high-volume business-critical transactions at a handful of large corporations. Starting up Jini lookup services with the JSK is typically a snap.

But it wasn't so that night. We struggled for an hour with this normally simple step, having to adjust several aspects of users' local environment: moving files around, deleting directories, checking for multicast capability on network interfaces, etc. The exercise was frustrating to those who, just a few short hours prior to our meeting, were able to run Jini lookup services on the very same laptops they brought with them to the meeting. The rigorous testing and QA processes followed by the Jini developers predicted nothing about how well the system would work on our impromptu network that night.

A few days later, Bill and I were sitting just a few yards away from Hobee's, trying to start up a new version of Artima code. Before checking in that code, I made sure that all the over one hundred unit tests for that module passed. Yet, when Bill checked that code out and started up the Web app on his laptop, a subtle configuration issue prevented the app from working as intended. While code itself was tested, the system relied on configuration options that were defined partially outside the code. The unit tests, again, were no indication of whether the code would run at all in a real-world environment.

Were our tests, or the Jini JSK's tests, flawed? How could we account for environmental exigencies in those tests? How deep should we aim for in our test coverage? Should we strive to cover all the permutations of code and its environment in our test suites? Is such complete test coverage of code even attainable?

System Matters

These experiences made me appreciate the distinction between testing code and testing a system. The real world only cares about the system - the actual interaction of all the code in a given piece of software with its environment. Unit tests, on the other hand, mostly test code: Unit tests are proof that a given method, or set of methods, act in accord with a given set of assertions.

Unit tests are also code. When running a set of unit tests, the code that's being tested and the test code itself become part of the same environment - they are part and parcel of the same system. But if unit tests are part of the system that's being tested, can unit tests prove anything about the system itself?

No lesser a logician than Kurt Gödel had something to say about this. To be sure, Gödel's concern was algebraic proof, not unit testing. But in addressing the false sense of certainty implied in Bertrand Russell's Principia Mathematica, Gödel demonstrated that it is not possible to prove all aspects of a system from within a system itself. In every logical system, there must be axioms - truths that must be taken for granted, and that can be demonstrated true or false only by stepping outside the system.

Such axioms are present in any software system: We must assume that the CPU works as advertised, that the file system behaves as intended, that the compiler produces correct code. Not only can we not test for those assumptions from within the system, we also cannot recover from situations where the axiomatic aspects of the system turn out to be invalid. If any of a system's axioms turn out to be wrong, the system suffers catastrophic failure - failure from which no recovery is possible from within the system itself. In practical terms, you will just have to reboot.

A cardinal aspect of a test plan, then, is to determine a system's axioms, or aspects (not in the AOP sense) that cannot be proven true or false from within the system. Apart from those system axioms, all other aspects of the system can, and should, be covered in the test plan.

The fewer the axioms, the more testable the system. Fewer axioms also result in less possibilities for catastrophic failure. But in any system, there will always be conditions that cause complete system failure - CTRL-ALT-DEL will be with us for good. Fully autonomous, infallible systems truly belong in the realms of science fiction and fantasy.

Degrees of Belief

If we accept that there will always be a few aspects of a system that we cannot write tests for, aspects whose correctness we must take for granted, how do we decide on those "axioms"?

Do you write test methods for simple property accessor methods, or do you just assume that the JVM does what it's supposed to do? Do you write a test to ensure that a database, indeed, inserts a record, or do you decide to take that operation for granted? Do you just assume that a network connection can be opened to a host - is that operation a system "axiom"? And do you just assume that a remote RMI call will return as intended, or do you write test for all sorts of network failures, along with possible recovery code? Finally, do you just assume that a user types a correct piece of data in an HTML form, or do you write tests and error handling code in that situation?

Clearly, there is a spectrum, and we often make our decisions about our "system axioms" based on our beliefs of certainty about correctness. Most of us are highly uncertain that every user always enters the right answer in a form, so we always write tests in that situation. But most of us are fairly sure that a database can perform an insert just fine, so writing tests for that operation would seem like a waste of time, unless we're testing the database itself.

If our decisions about what to take for granted in a system is based on such degrees of belief, and if tests start where "axioms" end, then the degree to which testing tells us about a system's behavior in the larger, operating context of that system, is also dependent on those beliefs.

The Jini code, for instance, assumed that multicast is enabled on all network hosts. The Artima code took a specific configuration for granted, and assumed that that configuration is the one supplied at system startup. We didn't test for that, just assumed that that is always so. The tests passed, but the system still failed when that condition was not satisfied in a different operating environment.

In addition to beliefs, we also have to contend with market pressures when choosing system "axioms." You may know that a remote method call can fail a hundred different ways on the network, but you also know that shipping a product today, as opposed to tomorrow, can lead to a market share gain. So you decide to not test for all those possible network failures, and to take the "correctness" of the network for granted. You hope you get lucky.

Past Behavior

We could improve our degrees of beliefs about system correctness if we analyzed past system behavior. One way to do this is to rely on experience. But another way to do this is to follow what a better search engine, such as Google, does: The more we use the system, the better the system gets because it learns form past data to improve its results.

We could instrument code in such a way as to capture failure conditions (e.g., by logging exceptions). We could then tell that, for example, one out of every N remote calls on a proxy interface results in failure, given the typical operating environment of that code. Note that that's real-world data, not just assumptions. Then we could assign the inverse of that measure - the degree of probability that the call succeeds - to that method call. We could then correlate that information to how often a call is used, and produce a matrix of the results.

That probability matrix would be a more accurate indicator of the code's actual reliability - or "quality" - than just having a set of tests that happen to all run fine on a developer's machine. Such information would help developers pinpoint what "system axioms" are valid, and what assumptions prove incorrect in the real world.

With that information, we would not need to strive for complete code coverage, only coverage that leads to a desired quality level. That would, in turn, make us all more productive.

I think it may even be possible to find emergent patterns in code with a probability matrix of that code shared on the Web. Coverage and testing tools could tap into that database to make better decisions about where to apply unit tests, and about how indicative existing unit tests are about actual code behavior.

That said, I'm curious how others deal with ensuring proper configuration, and how others account for configuration options in tests. What are some of the ways to minimize configuration so as to reduce the chances of something going wrong? Then, again, isn't reducing configuration is also reducing flexibility and "agility?"

In general, do you agree with my conclusion that complete test coverage is not desirable, or even attainable? How do you choose what to test and what not to test? How do you pick your "system axioms"?


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 1:47 AM
Reply to this message Reply
Posted by: Vincent O'Sullivan    Posts: 724 / Nickname: vincent / Registered: Nov, 2002
> ...another way to do
> this is to follow what a better search engine, such as
> Google, does: The more we use
> the system, the better the system gets because it learns
> form past data to improve
> its results.

In all your observations about the dangers of mistaking unit testing for system testing, the snippet above caught my eye. I'm intrigued to know how, in the absence of a feedback path, Google knows that I've found what I'm looking for and thus knows how to use that information to improve future searches.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 4:46 AM
Reply to this message Reply
Posted by: Maarten Hazewinkel    Posts: 32 / Nickname: terkans / Registered: Jan, 2005
> I'm intrigued to know how, in the absence of a
> feedback path, Google knows that I've found what I'm
> looking for and thus knows how to use that information to
> improve future searches.

Without any inside info, I can think of several heuristics that mainly work when you have a large audience.

- If you go to a next resultpage, what you were looking for wasn't on the one you were looking at: -> possibly adjust ranking.

- If you do a new search with different terms, you didn't find it either.

- If you no longer hit google after a specific result page, you either found what you were looking for, or gave up. If you went through a number of 'next' pages, the latter is more likely.

It's very fuzzy info, but statistical analysis can go a long way with the number of hits Google gets.

Maarten


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 9:34 AM
Reply to this message Reply
Posted by: Frank Sommers    Posts: 2642 / Nickname: fsommers / Registered: Jan, 2002
> In all your observations about the dangers of mistaking
> unit testing for system testing, the snippet above caught
> my eye. I'm intrigued to know how, in the absence of a
> feedback path, Google knows that I've found what I'm
> looking for and thus knows how to use that information to
> improve future searches.

There are several ways, and PageRank is one of them. This is research that was conducted at Stanford, and the results are published at various places. A quick overview is here:

http://www.google.com/technology/

They do have feedback, though: They know what people are searching for and, in the case of ads, they know what people click on given a search.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 2:12 AM
Reply to this message Reply
Posted by: Tasos Zervos    Posts: 17 / Nickname: tzervos / Registered: May, 2003

> That said, I'm curious how others deal with ensuring
> proper configuration, and how others account for
> configuration options in tests. What are some of the ways
> to minimize configuration so as to reduce the chances of
> something going wrong? Then, again, isn't reducing
> configuration is also reducing flexibility and "agility?"


I'm not sure if total flexibility/"agility" is desirable.
In most of systems I have worked with configuration was kept in CSVs AND XML AND property files AND DB registry tables AND interface constants AND/OR class static constants AND maybe JNDI etc. !

Unless there is a configuration framework in place, before you start a project, it is very likely that the "lets-ship-it squad" is going to create most of the above ways to store configuration data.

Rod Johnson's excellent "J2EE development without EJB" describes the best solution I've seen so far. Using a lightweight (IoC as he puts it) container like Spring and designing to interfaces rather than concrete classes there is a way to non-invasively configure your object in runtime and still allow great flexibility during testing.

I need to stress that it is not just a case of dropping such a framework/container in the mix and get your problems solved.
Designing for testing/maintenance should also be high in your list of priorities.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 6:18 AM
Reply to this message Reply
Posted by: Maarten Hazewinkel    Posts: 32 / Nickname: terkans / Registered: Jan, 2005
One approach that seems to be getting some attention in the agile/scripting language world (Ruby, Python, etc.), is to use the language itself to specify the configuration. So they just fold the configuration file into the source tree.

Now this approach is not feasable in a widely distributed Java application that runs on users desktops, but for a server-side application, you usually have a JDK (not just a JRE) available. That makes it possible to compile the java config file either on startup from the application, or as a separate step in the configuration process.

For the scripting language world, this is certainly a worthwhile idea. Is it also worthwhile for Java server applications?

Maarten


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 9:37 AM
Reply to this message Reply
Posted by: Frank Sommers    Posts: 2642 / Nickname: fsommers / Registered: Jan, 2002
> One approach that seems to be getting some attention in
> the agile/scripting language world (Ruby, Python, etc.),
> is to use the language itself to specify the
> configuration. So they just fold the configuration file
> into the source tree.
>
> Now this approach is not feasable in a widely distributed
> Java application that runs on users desktops, but for a
> server-side application, you usually have a JDK (not just
> a JRE) available. That makes it possible to compile the
> java config file either on startup from the application,
> or as a separate step in the configuration process.
>
> For the scripting language world, this is certainly a
> worthwhile idea. Is it also worthwhile for Java server
> applications?

It is possible, and many people use Ant as a sort of configuration tool (Ant tasks, to be precise, which are little Java programs).

The problem is that I can't test the correctness of that configuration, since the configuration itself requires the environment to conform to the very requirements I'd like to test for. So, if I deploy an app without being able to test the configuration, then I can still experience the unpleasant surprise that a perfectly well-tested, working app doesn't work.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 8:01 AM
Reply to this message Reply
Posted by: Chris Smith    Posts: 3 / Nickname: smitty1e / Registered: Jan, 2005
Do the GNU Autotools, including the lovely ./configure, count as a real-world example of just how gnarly the problem is?


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 11:30 AM
Reply to this message Reply
Posted by: David Beutel    Posts: 29 / Nickname: jdb / Registered: May, 2003
1. System tests are just as important as unit tests. Automate both.

2. When you find a problem (like the one at your meeting), write a test for it: system or unit, whichever it takes. So, the next time there's a problem, anyone can run the tests in the problem environment to help troubleshoot it.

3. Minimize the axioms. For example, the axioms of most systems include the JDK and Ant, along with their configurations (e.g., /etc/ant.conf, ~/.antrc, jre/lib/ext, etc). But, I prefer to depend on just the OS and CVS, so I put the JDK and Ant into CVS and used scripts to make sure that the project is using only the configuration from CVS. External systems, like database servers, are another axiom that is good to avoid if possible. For example, you can bundle an open-source database server with the project, turning an external system into an internal one.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 12:05 PM
Reply to this message Reply
Posted by: Frank Sommers    Posts: 2642 / Nickname: fsommers / Registered: Jan, 2002
> But, I prefer to depend on just the OS
> and CVS, so I put the JDK and Ant into CVS and used
> scripts to make sure that the project is using only the
> configuration from CVS. External systems, like database
> servers, are another axiom that is good to avoid if
> possible. For example, you can bundle an open-source
> database server with the project, turning an external
> system into an internal one.

This is an interesting point, and this is what I chosen to do in the past: to bundle all required software with a distribution so as to rely on as few external dependecies as possible. The problem with that is that even that larger distribution must exist in a context.

As one example, software my company ships requires network access to a machine that is to act as the server. When someone installs the software, that network access may be available. At some later point, though, customers have installed firewall software that shut off network access to the machine. So that results in breaking the software, and that, in turn, results in a poorer quality perception of the product in the customer's view. Not to mention, it results in technical support calls, which cost money.

So testing for the network is not worth for us, because there is nothing we can automatically do to solve that problem. If we could programmatically communicate with all firewall software/OS, etc out there, we could possibly solve this. But that's just not the case. I think that applying more tests in this area would not produce a payoff in our case.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 1:09 PM
Reply to this message Reply
Posted by: David Beutel    Posts: 29 / Nickname: jdb / Registered: May, 2003
That reminds me of a system test I added once. I was testing some code that used a library that depended on an external server. My tests started failing, and it took me some time to troubleshoot the problem: the external server had become unavailable. So, I added a test to first make sure that the external server was available before running any tests that depended on it. That saved me a lot of troubleshooting time over the next year or so.

But the problem you describe is different. If the external server becomes unavailable during operation, despite being available during installation and testing, and if this becomes a support problem, then I guess the solution is to add some functionality to alert the user or tech support when this problem occurs. This functionality could be as basic as a meaningful Exception chain propagated up to a log file that tech support can see. But if it happens often enough, then that might justify the functionality of a user-friendly alert via whatever user interface is available.

I think the real issue you're getting at is how much to assume versus how much to handle. It's the same issue when deciding what to assert in the production code. E.g., should you assert that you have gotten all the configuration settings that you're expecting? This issue is more about error handling than testing.

Cheers,
11011011


Tests don't fix usability Posted: Feb 16, 2005 11:58 AM
Reply to this message Reply
Posted by: Brendan Johnston    Posts: 10 / Nickname: bjsyd70 / Registered: Jun, 2003
White Box Unit tests use examples, logic, and knowledge of a units internals to validate that a unit works in a particular way in isolation from as much as possible.

Black box systems tests follow a goat track through the application and try to get a sensible result.

It seems what you need are some new features to add usability.

A unit that functions perfectly with correct inputs, but does not help diagnose incorrect input may be unusable. You need features to help people find out what they are doing wrong on their end.

A configuration checking feature that clearly identifies incorrect configuration and suggests the right remedy is very useful.

All SQL our application runs and the result from it are logged. That way when a table is changed or we are not connected to the right database this is clear from the logs.

It is not that we do or don't unit test an insert. It's that unit testing an insert does not guarantee good behavoir at runtime. Inserts can fail because of disk space, permissions, DDL changes, deadlock, network issues, and 5000 other things.

Now I am just off to unit test a select. It will not guarantee that it will work at runtime, but it is a quick way to work out if I am doing something dumb in my code.

Some things are easy to unit test and somethings are not.
Putting source that is hard to test into XML, rather than Java does not change this. But it could help you get closer to 100% coverage of Java code. So maybe you should switch to Spring for a false boost to your confidence.

Brendan


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 12:52 PM
Reply to this message Reply
Posted by: Darryl Mataya    Posts: 9 / Nickname: dmataya / Registered: Nov, 2004
I really like the concept of identifying axiomatic assumptions of your system. The problem is that while specifying all axioms is possible (but not likely) implementing all of them is unlikely. We have a financial application that predicts the future behavior of a portfolio of accounts - like a collection of consumer mortgages. The model’s behavior in the system is highly dependent on the quality of the data stored in the current book of business along with the quality of the economic assumptions used to predict the future. To guarantee valid results we would have to continually check all these assumptions at every step along the many paths where we calculating future events. This isn’t possible do to at PC processor speeds and return results within the product’s specified calculation time limits.

So we attempt to check these assumptions at other times when it is more efficient to do so. But again, it is not possible for a human to locate and specify every relationship that relies on this asynchronous validation.

In practice, we discover and document places where components create outputs that violate these assumptions. We then attempt to deal with these defects in future releases by polishing the offending components. In effect, we use the “purple wire” approach advocated by Fred Brooks in “The Mythical Man-Month” to manage these flaws discovered in system testing. The most important lesson we’ve learned is that a probability of occurrence must be assessed or evaluated when these system defects or purple wires show up. We don’t formally track these probabilities, but we certainly pick numbers and rank them when planning future enhancements.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 3:28 PM
Reply to this message Reply
Posted by: Michael Feathers    Posts: 448 / Nickname: mfeathers / Registered: Jul, 2003
All of this depends on what you think unit tests are good for. Personally, I don't write them to prove correctness, I write them because they facilitate change and decouple my designs.

Unit tests are great change detectors. They are also a great way of getting feedback during development, but yes, they are poor at detecting configuration problems or problems in macro-systemic behavior, because those are really not unit-level things.

Sounds axiomatic, but tests are very good at testing what they test but they are poor at testing what they don't test. For some reason, though, unit tests are often nailed for the things that are outside their scope. We could conclude that that means unit tests aren't too important, or we could conclude that there are other forms of testing that are better for those other problems.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 16, 2005 6:02 PM
Reply to this message Reply
Posted by: Bill Venners    Posts: 2284 / Nickname: bv / Registered: Jan, 2002
> Sounds axiomatic, but tests are very good at testing what
> they test but they are poor at testing what they don't
> test. For some reason, though, unit tests are often
> nailed for the things that are outside their scope. We
> could conclude that that means unit tests aren't too
> important, or we could conclude that there are other forms
> of testing that are better for those other problems.

I didn't interpret this as Frank trying to nail unit tests, but asking what degree of test coverage is optimal, and how can we deal with what's left over that's not tested. Frank and I are both big believers in the value of unit tests. As I've mentioned in previous discussions, what I do to decide whether to write a particular test, and what imagine most people do, is attempt to judge the ROI of the test. How does the expected return (a guess) compare against the required investment (an estimation)? Does that estimated ROI seem worth it in the current business context? If so, it makes sense to write the test. If not, then the business would be better off if I invest the time another way. So whether the test coverage target to aim for is 40% or 60% or 80% really depends on human judgement.

One of the messages I get from XP (everyone hears things through the filters of their own biases) is that I should really open my mind to the value of those unit tests--i.e., that I may tend to underestimate the return on investment. That the return is not just expressed in correctness, but in the confidence to make changes later.

I think the main question raised in Frank's post for me (once again filtering through my own biases) is how do we deal with configuration problems? I can think of a few ways:

1. Minimize configuration.

Sounds good, but this would seem to go against the grain of making software configurable so it is easy to tune it at deployment time without recompiling source code. That's a big trend.

2. Simplify configuration.

This sounds better to me. This says keep configuration as simple as possible given the requirements. Question requirements that call for complex configuration. When there's no configuration requirement left to get rid of, then simplify the expression of that configuration. Maybe try and put everything in one configuration file. Make sure each configuration parameter just shows up in one and only one place. (Oops, that's the DRY (Don't Repeat Yourself) principle.)

3. Write tests that explore what the app or system does when the axiomatic assumptions are broken.

This would force me to think about and decide what should happen in such cases, and would likely lead to friendlier software. Writing such tests might help me find more ways to simplify configuration and minimize external assumptions. Of course, I have to apply the ROI estimation to such tests, and whether or not to write them just depends on the situation.

Perhaps a good exercise is just to try to write down all the external assumptions an app or system makes, what could go wrong, and what should happen if it goes wrong. For scenarios that seem very far fetched, it may be OK to say that the resulting behavior is undefined (because it isn't worth it in the business context to actually define the behavior). For other scenarios, it may seem worth it to define the behavior, and then that becomes a new requirement, which should be implemented and probably tested.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 17, 2005 8:33 AM
Reply to this message Reply
Posted by: Merriodoc Brandybuck    Posts: 225 / Nickname: brandybuck / Registered: Mar, 2003
I think testing as much as you can is always desireable. Where I have an issue with unit tests generally is that the hard core XP/Unit test advocates always seem to say "If you had good unit tests..." every time somebody even mentions having any kind of bug, as if Unit Testing is some silver bullet that can find all bugs, fix all problems and likely help me lose 30 pounds.

"I got N% coverage with my unit tests." Great. You did all your development on Red Hat 9. What if I try running that on a Debian based distribution? You said that you support Linux, why won't it run? Or substitute your favorite versions of Windows or Office or whatever. They are not a panacea and I hate that they get portrayed as such by some people.

What Frank's article touches on is that there are system wide issues that need to be accounted for and things that you just can't unit test. If I'm writing an application for any mainstream operating system these days, except maybe OS X because it hasn't been through too many versions yet, I cannot possibly test every permutation of every possible environment including security updates, kernel updates, browser updates, etc. Some things you have to take on faith. That being said, I don't think you need to question whether complete test coverage is desirable. I think we all know it is not attainable, but that should not stop you from beating on your application as much as possible during the testing phase. All other things being equal (that pesky ROI acronym came up... :-), the more you know the better off you will be.

As far as making applications configurable goes, this helps in some ways and makes things harder in others. Generally there are less areas in which a configurable application can go wrong because the goal is to reduce the amount of code and logic in the code. Less code equals less chance for bugs. However, if you do not select good defaults and the majority of users have to tweak things, you can bet people will be putting in, accidentally of course, all kinds of wacky data which you will have to defensively code against. Granted that code isn't hard to write, but it is boring, which means that it doesn't get done a lot of times. That is its own sort of problems, although those issues are usually a lot less problematic than horrible nasty logic errors that cause your data base to go bye-bye or your machine to periodically and predictably crash.

In our projects we put a lot more weight on system and integration testing than we do unit testing. That is where all the interesting bugs have come up, anyway. At least that's been my experience. You can unit test Component A and Widget B to death and have them all work, but then you hook Component A to Widget B and in some cases where the date ends in 9 on a month starting with J during lunch in the Pacific time zone, the whold damn thing fails. If the program was responsibly logging its problems and told you the database was locked and it couldn't access it and you followed this trail and saw that this is when database maintenance was happening, that's useful. No amount of unit testing is going to tell you that.

Eat well. Exercise. Die anyway. There has to be something similar somewhere about unit testing, code reviews and crashing software...


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 17, 2005 9:05 AM
Reply to this message Reply
Posted by: Frank Sommers    Posts: 2642 / Nickname: fsommers / Registered: Jan, 2002
>
> What Frank's article touches on is that there are system
> wide issues that need to be accounted for and things that
> you just can't unit test. If I'm writing an application
> for any mainstream operating system these days, except
> maybe OS X because it hasn't been through too many
> versions yet, I cannot possibly test every permutation of
> every possible environment including security updates,
> kernel updates, browser updates, etc. Some things you have
> to take on faith.

I think it is possible to test the interaction of components only when components advertise and implement an interface that has clear semantics. If we know the semantics of an interface, we can encode those semantics in testing code, and then run that testing code in a given environment. The problem is that many subsystems just don't have standard interfaces with well-defined semantics. For instance, even databases don't have standard, unified error codes.

But the bigger point is whether it's feasible to test for such things. In the case of the database insert, for instance, it's almost always more effective to just assume that the operation works.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 17, 2005 6:49 AM
Reply to this message Reply
Posted by: Keith Ray    Posts: 658 / Nickname: keithray / Registered: May, 2003
At the BayXP meeting last night, two people told of how much statement coverage they had gotten using TDD and XP practices: 70% for one project, 97% for another project. Neither project was measuring test coverage during development.

If you're doing Java development, it seems that the Agitator product from http://www.agitar.com/ will be able to mostly-automatically create and run as many as 30,000 tests with little effort on the user's part.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 17, 2005 8:03 AM
Reply to this message Reply
Posted by: Malte Finsterwalder    Posts: 20 / Nickname: jufoda / Registered: Aug, 2003
It is indeed a good idea to think about what you require from your environment. And I think it is a good idea to write some tests for that. But I'm not thinking about unit tests here, since they are useless in the final environment.
I'm thinking about a checkup-feature. The application could run a checkup of it's environment at startup or regularly or by request of the user, or... But I have seldom seen such a feature. I have seen a system that monitored the network and blocked the application when the network became unreachable.

I can think of a lot of things that would be sensible to routinely check at least at startup:
- Is the database reachable?
- Does it have a schema I can work with?
- Do I reach all the other systems I need?
- Are all configuration parameters set?
- Do they have meaningfull values?

A lot of this stuff could be embedded in frameworks.
e.g. An OR-Mapping-Tool like Hibernate for example has a definition of what is required from the DB schema. It should be possible to connect to the database and check, whether the schema fullfills all those requirements.

I'm also missing a framework to handle configurations. Everybody writes their own config file read/write code. Does anyone know of a simple framework for handling software configurations?

Greetings,
Malte


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 17, 2005 8:55 AM
Reply to this message Reply
Posted by: Frank Sommers    Posts: 2642 / Nickname: fsommers / Registered: Jan, 2002
> I'm thinking about a checkup-feature. The application
> could run a checkup of it's environment at startup or
> regularly or by request of the user, or... But I have
> seldom seen such a feature. I have seen a system that
> monitored the network and blocked the application when the
> network became unreachable.
>
> I can think of a lot of things that would be sensible to
> routinely check at least at startup:
> - Is the database reachable?
> - Does it have a schema I can work with?
> - Do I reach all the other systems I need?
> - Are all configuration parameters set?
> - Do they have meaningfull values?
>
> A lot of this stuff could be embedded in frameworks.
> e.g. An OR-Mapping-Tool like Hibernate for example has a
> definition of what is required from the DB schema. It
> should be possible to connect to the database and check,
> whether the schema fullfills all those requirements.
>

This is interesting, because I've been doing this kind of stuff for product that we ship. So when it starts up, it goes through a set of rules, and verifies that those rules are satisfied. If a rule is not satisified, there some amount code there that tries to fix the problem, i.e., update the db schema, as for missing user account information, etc.

This, btw, brings up the interesting question of developer testing vs runtime testing. Runtime testing offers the possibility of automatic error correction. But, of course, only those system aspects can be corrected that runtime testing can test.

One way to do some runtime testing would be to ship your developer unit tests with the product, and periodically run those tests in the deployment environment. In commercial software, does anyone ship unit tests with an application?


reference to Gödel Posted: Feb 18, 2005 11:26 PM
Reply to this message Reply
Posted by: Andrew Dalke    Posts: 291 / Nickname: dalke / Registered: Sep, 2003
We must assume that the CPU works as advertised, that the file system behaves as intended, that the compiler produces correct code. Not only can we not test for those assumptions from within the system, we also cannot recover from situations where the axiomatic aspects of the system turn out to be invalid.

I disagree. The FDIV bug on the Pentium was discovered, as I recall, by getting different results from two algorithms that should agree. fsck does checks on the file system. I've written code to work around compiler errors.
If a disk goes flaky it can be detected (eg, via checksum comparisons) and perhaps worked around, as by switching to an alternate disk.

My disagreement though is more on the formalism you use. You use the language of logic and talk about testability as axioms. Computers are an approximation to a Turing machine. No machine has infinite memory and all are built on top of an analog/quantum world. The logical model doesn't include the possibility of the tape of one machine getting too large and knocking down another machine.

More relevant, your formalism does not include economics. Some things are testable ("axioms") but are aren't worthwhile to test. I could implement three distinct algorithms to cross-check floating point multiplication. I could use a file format that can recover from a sector going bad. But these cost time and money and with almost no benefit these days.

A cardinal aspect of a test plan, then, is to determine a system's axioms, or aspects (not in the AOP sense) that cannot be proven true or false from within the system. Apart from those system axioms, all other aspects of the system can, and should, be covered in the test plan.

While I would say that a test plan should bear in mind that many possibilities are testable but the skill is in knowing which should be tested and which can be ignored.


Re: Is Complete Test Coverage Desirable - or Even Attainable? Posted: Feb 20, 2005 8:55 PM
Reply to this message Reply
Posted by: Curt Sampson    Posts: 21 / Nickname: cjs / Registered: Mar, 2004
I see two things here that look like they might be misperceptions to me.

1. When you're trying to test your code, transient failures and suchlike are irrelevant. A server becoming unavailable due to a network failure is not the fault of your code, and not something you can fix by changing your code. You instead need to do systems analysis, and write a system that will not fail in this way (if that's the requirement).

What you can test, if you want, is your code's reaction to that sort of failure. If printing out a stack trace and exiting is not the right behavior, test that your code doesn't do that.

2. I think you're making an artificial distinction between configuration and code that is better, from a testing point of view, not to be made. Whether you use Java code or XML code or an entry in a properties file to decide what database server to connect to, you are still writing down something and saving it in a file to change the behavior of your program. If you feel it needs testing, test it!

Your web server example bought to mind something that came up a few months ago in a web application I'm working on. Most of the code is contained in PHP files, but the correct working of the site also depends on certain rewrite rules in the Apache configuration file. In fact, given that these deal with some stuff related to session control when cookies are not available, these rewrite lines are absolutely criticial to the correct functioning of the site. So I don't treat them as "configuration," I treat them as code, and test them as code. These lines reside in just one file, and that same file is used when generating test or production httpd.conf files. That way I ensure that I'm testing and rolling out the same code. And I actually bring up a web server for my functional tests to make sure that those rewrite lines are working properly.

To answer your question about the desirability of complete test coverage, or more particularly about what to cover: let pain be your guide. If something causes you pain, and does so more than once, that's probably a signal that something wants automating.

If a config file was broken, and not in an obvious way, what can you change to lessen the chances that it will be broken next time you need to make one? Can your application check that necessary settings are set? Can you generate all or part of the config file from source checked into your repository, preferably tested source? If multicast needs to be enabled for something to work, can you write a test to fail if it's not enabled?

Testing is really just about getting really creative to solve these sorts of problems.


Topic: The Diggins Friday Digest Previous Topic   Next Topic Topic: InstallShield – why?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use