The Artima Developer Community
Sponsored Link

Weblogs Forum
Reviewable, Runnable Specifications Ensure Software Quality

51 replies on 4 pages. Most recent reply: Jul 11, 2010 8:16 PM by Andy Dent

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 51 replies on 4 pages [ « | 1 2 3 4 | » ]
Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 10, 2010 3:31 PM
Reply to this message Reply
Advertisement
> > Isn't it enough to check the subroutine using a prime
> and
> > a non-prime?
>
> Uh, no.
>
> A trivial example:
>
> def isPrime(x):
>   return (x % 2) != 0
> 

> Test:
>
> isPrime(3) -> True
> isPrime(4) -> False

But your algorithm is wrong. How about coding the correct algorithm and placing static asserts at specific places so as that there is no ambiguity as to what the algorithm should do?

>
> > The range-based checking would be used inside the prime
> > checking function, so if all the functions inside the
> > prime checking function are correct, then the prime
> > checking function is correct as well, assuming it
> passes
> > the test for one prime and one non-prime.
>
> How does that change anything?

It would ensure its step along the way conforms to the appropriate relevant specs.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 10, 2010 4:03 PM
Reply to this message Reply
> But your algorithm is wrong.

Yes. That was my intention.

> How about coding the correct
> algorithm and placing static asserts at specific places so
> as that there is no ambiguity as to what the algorithm
> should do?

The claim is that "runnable specs" (i.e. tests) 'ensure' correctness. The point of the example is that it shows two tests (you suggested that only two were needed) that are valid tests that pass but in no way ensure correctness. In other words, the tests fail to show that the code is correct.

If we can reliably 'code the correct algorithm', then there's no need to test. The idea behind testing is that it's used to catch errors in the code. In other words, their existence presumes the possibility that errors will be made.

Do you really need me to explain all of this? Static analysis is not a 'test' in the context of this discussion nor is "coding the correct algorithm." 'Test' refers to executable unit tests in this context.

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 3:44 AM
Reply to this message Reply
>> If we can reliably 'code the correct algorithm', then there's no need to test. The idea behind testing is that it's used to catch errors in the code. In other words, their existence presumes the possibility that errors will be made.


That's true. As Dijkstra said: Testing can only reveal the presence of bug, not their absence. But I think that's not about what Mr Armstrong said. He wants to ensure that the spec is more or less correct by first writing it down and second by reviewing it. If you begin with coding first than you are stuck in implementation details, and then danger is you tend to program something that has nothing to do with the requirements (e.g. speculative generality). But if you write the spec first then you have to implement only the spec (until the tests pass) and nothing more or less. If we apply this to the prime numbers example: 2 tests are of course not enough. This is the minimum. So the confidence that the code works will also not be very high. If you have 50 prime numbers and 50 non-prime-numbers as input (100 tests) and your algorithm passes them, then the confidence will be very high. Because the probability than that you pass them all by chance is very low.

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 4:48 AM
Reply to this message Reply
Now lets calculate this prime example :-)
Suppose we choose 50 non-prime numbers in the range [1...1000]. What is the probability that all of them are even? (because if one of them is odd than the test from James Watson would fail [it would output that it is a prime] and so we want to assume the worst case: non of our even numbers should show the bug!).

So in the range [1...1000] there are 832 non-prime numbers, 500 of them are even and 332 of the odd. This is the java program for calculating the probability p that we choose only even ones (and thus cannot show the bug):

double p = 1.0;
double c1 = 500.0;
double c2 = 332.0;

for(int z = 50; z > 0; z--)
{
p *= c1 / (c1 + c2);
c1 -= 1.0;
}
System.out.println(p);

And this is the output: 3.117881358928328E-12
That means the probability that we will show the bug is 99,99999999968822 %.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:16 AM
Reply to this message Reply
> > But your algorithm is wrong.
>
> Yes. That was my intention.

I don't think that the intention is to prove if an algorithm is correct or not. This is not feasible for all the known reasons.

I think the intention is to prove that an algorithm already known to work is coded without errors.

>
> > How about coding the correct
> > algorithm and placing static asserts at specific places
> so
> > as that there is no ambiguity as to what the algorithm
> > should do?
>
> The claim is that "runnable specs" (i.e. tests) 'ensure'
> correctness. The point of the example is that it shows
> two tests (you suggested that only two were needed) that
> are valid tests that pass but in no way ensure
> correctness. In other words, the tests fail to show that
> the code is correct.
>
> If we can reliably 'code the correct algorithm', then
> there's no need to test. The idea behind testing is that
> it's used to catch errors in the code. In other words,
> their existence presumes the possibility that errors will
> be made.
>
> Do you really need me to explain all of this? Static
> analysis is not a 'test' in the context of this discussion
> nor is "coding the correct algorithm." 'Test' refers to
> executable unit tests in this context.

No, it's ok, I understand. I just rant in the context of correctness.

I think that reliability of code can be improved though. My gut feeling is that whatever man can think of about the code a machine can also do, so I am just trying to codify my thought abstractions on this.

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:18 AM
Reply to this message Reply
> Now lets calculate this prime example :-)
> Suppose we choose 50 non-prime numbers in the range
> [1...1000]. What is the probability that all of them are
> even? (because if one of them is odd than the test from
> James Watson would fail [it would output that it is a
> prime] and so we want to assume the worst case: non of our
> even numbers should show the bug!).

If we know an algorithm is correct, isn't it sufficient to prove that the algorithm's implementation is correct?

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:41 AM
Reply to this message Reply
>> If we know an algorithm is correct, isn't it sufficient to prove that the algorithm's implementation is correct?

Of course. But I assumed that you are implementing a new algorithm. If the algorithm is given I would try to find an implementation by googling :)

Michael Goldman

Posts: 9
Nickname: keppla
Registered: Jul, 2009

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:45 AM
Reply to this message Reply
> If we know an algorithm is correct,
> isn't it sufficient to prove that the
> algorithm's implementation is correct?

And how to prove that this proof is correct?

Imho, tests are not proofs, and shouldn't try to be. Their virtue is not, that they are complete, but that they are simple.

It's more like a empiric approach based on falsification: you cannot say that your algorithm for primes is correct, if it passes for 1-23, but you can say it is NOT if it doesn't.
And if the programmer does not explicitly try to fool the tests, how many errors can you imagine that pass this tests?

In my experience, most errors are not very subtle. Often they result in big exceptions, and even a test, that doesn't check return values would find them.

So, they don't make your program perfect, but they definitively improve it.

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:52 AM
Reply to this message Reply
> > But your (prime number) algorithm is wrong.
>
> Yes. That was my intention.
>
Ah, yes. The perfect argument. Take the assertion to the ultimate extremity, construct an unassailable case against that, and claim that the entire argument is therefore null and void.

But engineering is about matters of degree. A key element in managing the complexity is to categorize the cases, and the enumerate those--not the individual data points.

Even with numbers, there are natural breaking points. For example: All positive integers less than 2*8, zero, negative integers, 2**, and 2*8+1. For numeric algorithms used in the vast majority of applications that kind of breakdown is sufficient.

The use of a prime number algorithm was particularly adroit, in that regard. Since it is not possible to enumerate in advance all possible primes, it is not even theoretically possible to prove the algorithm is correct. But as one illuminary is quoted as saying, "We can never prove an algorithm is correct. The best we can do is prove that we couldn't make it fail."

Note, too, that statically-generated tests can never prove that an algorithm is robust! Following the test-first methodology, I can make it work for "1" by returning the expected value. Then I can make it work for "2" by adding another return statement! But of course, the resulting "algorithm" fails for 3 and any number I didn't code for. Far from robust. Still, my tests succeed!

One idea, then, would be categorize the range of inputs for which the software is expected to work, and the test harness would select a random value in that range. That kind of testing would distinguish "vaporware" (hard coded values) from code that was backed by an actual algorithms.

In practice, though, that's overkill. We code algorithms, not hard values, because we know that otherwise, it will fail the minute users work with it. The tests are only there as a "check", not as "proof".

But, and this was the assertion in the article, if I can examine the list of tests you've constructed, I will pretty rapidly be able to determine the robustness I can expect from the algorithm you implemented. If the tests for paths only use forward slashes, for example, I'll expect it to break on systems that use backward slashes in their paths.

In short, I've categorized the kinds of cases the algorithm might possibly handle: forward and backward slashes, absolute and relative paths, dot and dot-dot segments, etc. Were I looking for "proof", it would then be necessary to enumerate and test every possible combination. But for a "quality check", I'm only looking for a representative sample.

A representative sample is sufficient, because if your algorithm handles a decent number of "interesting" cases, I have a high level of confidence in it. And that "level of confidence" is another way of saying "quality".

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 7:58 AM
Reply to this message Reply
> There's been an interesting discussion on the Agile Alliance list in
> linked-in recently about how usability was missed in the
> framing of the original Agile Manifesto.
>
Thanks for alerting me to that. I'm rabid about usability. I just joined that group, but couldn't find the discussion. Is there a way you could point me to it?
'preciate it.

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 8:12 AM
Reply to this message Reply
> That's true. As Dijkstra said: Testing can only reveal the
> presence of bug, not their absence.
>
He was the man. No doubt about it. Thanks for quoting my hero.

> [Armstrong] wants to ensure that the spec is more or less
> correct by first writing it down and second by reviewing it.
>
Correctamundo!

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:03 AM
Reply to this message Reply
> > > But your (prime number) algorithm is wrong.
> >
> > Yes. That was my intention.
>
> Ah, yes. The perfect argument. Take the assertion to the
> ultimate extremity, construct an unassailable case against
> that, and claim that the entire argument is therefore null
> and void.

I understand where this is coming from but I would prefer if you would take me at my word when I say that tests are great. I am not in any shape or form suggesting that the process you are suggesting should not be done or that it doesn't produce value. I've found that similar approaches greatly improve my productivity i.e. the benefits far outweigh the costs.

What I am suggesting is that what you are advocating is a good idea but not, by itself, good enough. Words matter. If you say 'ensures' when you don't really mean 'ensures', people won't necessarily know that. When you say that these testing practices 'guarantee' that a new implementation works the same as the old one, some people will take that at face value.

> ...
>
> The use of a prime number algorithm was particularly
> adroit, in that regard. Since it is not possible to
> enumerate in advance all possible primes, it is not even
> theoretically possible to prove the algorithm is correct.
> But as one illuminary is quoted as saying, "We can never
> prove an algorithm is correct. The best we can do is prove
> that we couldn't make it fail."

Please understand that the use of this example was chosen not for it's rhetorical properties but because it is well-known what the algorithm should do.

> Note, too, that statically-generated tests can never prove
> that an algorithm is robust! Following the test-first
> methodology, I can make it work for "1" by returning the
> expected value. Then I can make it work for "2" by adding
> another return statement! But of course, the resulting
> "algorithm" fails for 3 and any number I didn't code for.
> Far from robust. Still, my tests succeed!

Side-bar: isn't using statically defined tests the 'right way' to do things? Otherwise you are testing one algorithm against another and in practice, the two algorithms are often the same. I didn't realize you were advocating dynamic testing and I'm actually not completely sure what you mean by that.

> In short, I've categorized the kinds of cases the
> algorithm might possibly handle: forward and backward
> slashes, absolute and relative paths, dot and dot-dot
> segments, etc. Were I looking for "proof", it would then
> be necessary to enumerate and test every possible
> combination. But for a "quality check", I'm only looking
> for a representative sample.
>
> A representative sample is sufficient, because if your
> algorithm handles a decent number of "interesting" cases,
> I have a high level of confidence in it. And that "level
> of confidence" is another way of saying "quality".

Again, what you are doing is great but validating that '.' and '..' work doesn't mean that './..' will work properly. In my experience the really painful bugs are the ones that deal with corner cases. And that bugs are usually related to the more complex inputs (assuming you have reasonably competent developers.) Testing for individual requirements in isolation mainly saves time. If you start testing all the different permutations of inputs, you are back in unfeasible territory.

From what I understand, research shows that code-reviews and prototyping more effective than unit testing in terms of finding bugs. If you are disputing that, I think you should give some evidence or at least a rational argument for why.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:13 AM
Reply to this message Reply
> In my experience, most errors are not very subtle. Often
> they result in big exceptions, and even a test, that
> doesn't check return values would find them.

I don't disagree but not all bugs have equal costs. The subtle bugs are the ones that: 1. make it into production code. 2. are difficult to track down. 3. cause the most havoc.

For example, a formative experience in my career was spending weeks manually correcting data in a database because a subtle issue was corrupting it for 6 months.

On the other hand, the bugs that throw exceptions on simple input cause a lot of visibility (which is bad) but are generally easy to find and fix. Almost any testing approach will identify these kinds of problems reliably.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:26 AM
Reply to this message Reply
> And this is the output: 3.117881358928328E-12
> That means the probability that we will show the bug is
> 99,99999999968822 %.

If you know what bug the code has, it's easy to craft a test that will catch it. This isn't a rhetorical statement. One of the most powerful things about runnable tests is that you can use them prevent a bug from reappearing. This is very valuable because a bug that reappears hurts your credibility far more than a new bug.

The problem is that that's not the only way to incorrectly calculate primes. And in general, we don't know what bugs exists in the code. The probability that your test will detect all possible bugs is 00.00000000000000%. In fact your test doesn't even catch a significant number of the possible things that could be wrong. It's unlikely (though, not completely improbable) that the bug I gave as an example would actually be coded.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:26 AM
Reply to this message Reply
>> There's been an interesting discussion on the Agile
>> Alliance list in linked-in recently about how usability was missed in
>> the framing of the original Agile Manifesto.

oops, sorry, I confused one of the discussions we've had there about the manifesto (importance of face-to-face) with Jim Coplien talking about the manifesto in this interview: http://www.infoq.com/interviews/coplien-dci-architecture

Flat View: This topic has 51 replies on 4 pages [ « | 1  2  3  4 | » ]
Topic: The fate of reduce() in Python 3000 Previous Topic   Next Topic Topic: Article-Recommendation Service?


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us