The Artima Developer Community
Sponsored Link

Weblogs Forum
Reviewable, Runnable Specifications Ensure Software Quality

51 replies on 4 pages. Most recent reply: Jul 11, 2010 8:16 PM by Andy Dent

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 51 replies on 4 pages [ « | 1 2 3 4 | » ]
Michael Goldman

Posts: 9
Nickname: keppla
Registered: Jul, 2009

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:31 AM
Reply to this message Reply
Advertisement
> The subtle bugs are the ones that:
> 1. make it into production code.
> 2. are difficult to track down.
> 3. cause the most havoc.

I didnt meant to say that only testing functions by assuring they dont throw exceptions is sufficient, i just wanted to say, for many projects i've seen even this minimal approach is an improvement.

The more subtle ones are the ones i would hope to catch with "real" testing, and by that implying by having code that is clean enough to be testable.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 11:39 AM
Reply to this message Reply
> > The subtle bugs are the ones that:
> > 1. make it into production code.
> > 2. are difficult to track down.
> > 3. cause the most havoc.
>
> I didnt meant to say that only testing functions by
> assuring they dont throw exceptions is sufficient, i just
> wanted to say, for many projects i've seen even this
> minimal approach is an improvement.
>
> The more subtle ones are the ones i would hope to catch
> with "real" testing, and by that implying by having code
> that is clean enough to be testable.

Exactly. It's a waste of time for "real" testing to be done on code with bugs that are easily caught with unit testing.

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 6:59 PM
Reply to this message Reply
>> The problem is that that's not the only way to incorrectly >> calculate primes. And in general, we don't know what bugs >> exists in the code. The probability that your test will
>> detect all possible bugs is 00.00000000000000%. In fact
>> your test doesn't even catch a significant number of the >> possible things that could be wrong. It's unlikely
>> (though, not completely improbable) that the bug I gave
>> as an example would actually be coded.

Never mind, it was fun thinking about this prime number example. For this particular example I gave exact numbers. The procedure was to randomly choose the test input. Even then a very high probability for bug detection was reached. But it is even better if a domain expert (in this case: a mathematician) determines the test cases. She has the knowledge or intuition where the critical points could be.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 11, 2010 9:16 PM
Reply to this message Reply
> The more subtle ones are the ones i would hope to catch
> with "real" testing

Another consideration is if you are in an environment where every changed line of code is viewed in peer review, designing your tests to catch things that are easier to miss when reviewing, or draw things to the reviewer's attention.

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 12, 2010 8:33 PM
Reply to this message Reply
> I understand where this is coming from but I would prefer
> if you would take me at my word when I say that tests are
> great. I am not in any shape or form suggesting that the
> process you are suggesting should not be done or that it
> doesn't produce value.
>
Ah, great. We are indeed on the same page. Apologies for having missed that viewpoint as the thread deepened.

> What I am suggesting is that what you are advocating is a
> good idea but not, by itself, good enough. Words matter.
> If you say 'ensures' when you don't really mean
> n 'ensures', people won't necessarily know that.
>
Fair point. I tend towards maxims. "Time is money!" That's not an assertion of an absolute equality, or an insistence that time is the only thing that matters. It is, instead, a succint and memorable way of stating one aspect of things--one that does matter, so saying it in a memorable way is helpful, if not absolutely accurate.

> Side-bar: isn't using statically defined tests the 'right
> way' to do things?
>
Definitely. I mispoke when I said "statically-generated". Your choice of "statically-defined" is much more accurate, and unambiguous. Mind if I steal it?
:__)


> Again, what you are doing is great but validating that '.'
> and '..' work doesn't mean that './..' will work properly.
> In my experience the really painful bugs are the ones
> that deal with corner cases.
>
Right, but again, we're dealing with an engineering problem. It is tempting to consider a tool that dynamically constructs hundreds of combinations, and tests them all--but even then, we are dealing with issues that may never arise in practice, or for which there are fairly simple workarounds. So it's possible to spend far too much time on a quality "proof", when a reasonable assurance is all you really need. (I've put together really bulletproof programs that did great stuff--for which the hardware no longer exists to run them, much less the operating system. Talk about over-engineering. "Good enough" would have made a lot more sense.)

> If you start
> testing all the different permutations of inputs, you are
> back in unfeasible territory.
>
Definitely. Any attempt to make a 100% argument is doomed to failure. We simply need to do the best we can.

> From what I understand, research shows that code-reviews
> and prototyping more effective than unit testing in terms
> of finding bugs. If you are disputing that, I think you
> should give some evidence or at least a rational argument
> for why.
>
That's an interesting argument. But I dispute that any person in the world reads code the same way the computer executes it! Those are good practices, but early design reviews were found to be a far more significant predictor of success. (If memory serves, it was the /only/ really significant determin-er.) But design reviews only work for the first version. They don't protect against regressions. Spec reviews, like design reviews, ensure quality--but in a way that also prevents future regressions.)

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 13, 2010 4:59 AM
Reply to this message Reply
> > From what I understand, research shows that
> code-reviews and prototyping more effective than
> >unit testing in terms of finding bugs.


>Those are good practices, but early design
> reviews were found to be a far more significant predictor
> of success. (If memory serves, it was the /only/ really
> significant determin-er.)

Hmmm, I'm feeling too lazy to spend time digging out references but suspect you're talking about different things.

I was under the impression that design reviews are very good predictors of project success and also reduce bug rates but that peer reviewing of code still trumped unit testing in terms of numbers of bugs found.

Note that the pair-programming of XP is effectively continual peer review.

What I'm really intrigued by is your point about design reviews not helping with regressions. I'd like to partly disagree. I think that design reviews are like a genetic analysis of your code - they can show a predisposition towards certain kinds of bugs.

During design reviews, you should be logging concerns (#1) over smelly bits that seem likely to cause future bugs. For example, multiple views needing to be updated vs caching for performance (to pick an area that seems to cause a lot of regressive bugs in one IDE I use).

Coupled with some statistical analysis of the type and location of bugs found, this list of candidate vulnerabilities can steer the writing of unit tests and the amount of attention going into testing and future code reviews.

There's also the interesting issue of at what point a change is sufficiently complex that it requires a pre-emptive design review?

(#1)
One of my favourite process vulnerabilities to look out for is how an organisation records concerns and things to deal with in future. One simple approach is to use your issue tracker and have categorisation that keeps these easily searchable and not adding too much noise into the main bug lists. The need to record such issues, and maybe design "bugs" can be a useful determinant when picking an issue tracker. I quite happily use Mantis categories for things like "Design Weakness", "Refactoring Opportunity" and so on.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 14, 2010 9:40 AM
Reply to this message Reply
> Fair point. I tend towards maxims. "Time is money!" That's
> not an assertion of an absolute equality, or an insistence
> that time is the only thing that matters. It is, instead,
> a succint and memorable way of stating one aspect of
> things--one that does matter, so saying it in a memorable
> way is helpful, if not absolutely accurate.

In general, I think that's fine. It's just that in this particular context, there is a real danger of perpetuating a misconception that unit-testing can guarantee correctness.

The basic idea was iterated previously (but I'm not sure if totally seriously.) It goes like this: if you test all your units and they work exactly as specified then the combination of all those units must be correct. I've seen this repeatedly argued from the perspective of engineering/manufacturing: if you get the precision of your parts to very low tolerances, you will get precise assemblies. It seems logical at first blush but it's not actually true. This was what the American car makers have been attempting to do for decades believing that it was what the Japanese were doing. In reality, the per part tolerances of for American cars were higher than for the Japanese cars but opposite for the assembled vehicles.

If you've ever laid-down a floor or put up paneling, you can see why this can be the case. No matter how carefully you plan and measure, you will never be able to cut all the pieces ahead of time and get a good results. You put the pieces in and cut them to fit as you go.

The best example I can think from the software perspective is one of a failed Mars lander built by NASA (IIRC.) One team built a highly reliable unit that did everything using SI units. Another team built a highly reliable unit using standard American units. Why as scientist would ever use American units is beyond me but it's beside the point. So you had two units that worked exactly as specified that when put together caused the assembly to fail catastrophically.

> Definitely. I mispoke when I said "statically-generated".
> Your choice of "statically-defined" is much more accurate,
> and unambiguous. Mind if I steal it?
> :__)

Sure. Maybe it's something that needs a standardized definition. I'm not sure what the official term is for this.

> That's an interesting argument. But I dispute that any
> person in the world reads code the same way the computer
> executes it! Those are good practices, but early design
> reviews were found to be a far more significant predictor
> of success. (If memory serves, it was the /only/ really
> significant determin-er.) But design reviews only work for
> the first version. They don't protect against regressions.
> Spec reviews, like design reviews, ensure quality--but in
> a way that also prevents future regressions.)

I consider design-reviews and code-reviews to be two different things so I'm not totally sure if we are talking about the same thing.

Since you mentioned regressions, I've actually spent a lot more time trying to build non-trivial regression testing than I have with unit testing. It's not that far off from what the tools you mentioned do. You take your functional requirements and then create an input and an expected output document. This works well for (RESTful) web services. In practice it's a bit more complicated than that but the key is that it's pretty easy to build tests that validate that the system works as it did before for those tests. Whenever I miss a problem, I create a new test to check for it.

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 14, 2010 11:56 AM
Reply to this message Reply
What I found out and what could be interesting in this context is

http://jnb.ociweb.com/jnb/jnbJun2010.html

It is an introduction to JBehave, quote: One goal of BDD is to provide a ubiquitous language for specifying expected behavior that can be executed as part of an acceptance test and can be understood by developers and stakeholders alike.

With JBehave you can have a spec like this:

Scenario: Unsuccessful Login
Given the user has entered username 'foo'
When the Login button is pressed
Then an alert dialog 'Please provide a password' is displayed

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 14, 2010 12:25 PM
Reply to this message Reply
> I was under the impression that design reviews are very
> good predictors of project success and also reduce
> bug rates but that peer reviewing of code still trumped
> unit testing in terms of numbers of bugs found.
>
Yes on point #1. Didn't know about point #2. Thanks for sharing it. I will say, though, that in the days before unit testing when I was coding professionally, I had 2 bugs reported by users in two years. I did so much testing myself, despite the lack of tools, that there weren't many bugs to be found. With reviewable tests, I can contribute to your code, sight unseen. That is, without any understanding of how it works, I can make a significant contribution simply by suggesting additional tests.)

> Note that the pair-programming of XP is effectively
> continual peer review.
>
Yeah. I like that. It turns out to be a lot of fun, too. (My fear, as a solo programmer, was that I would be kept going past the point that I was alert, or that my inepitude would be on display. But I discovered that everyone has blind spots, everyone needs breaks, and there are always things to laugh about--and the laughing is more fun when someone else is doing it with you.)

> What I'm really intrigued by is your point about design
> reviews not helping with regressions. I'd like to partly
> disagree. I think that design reviews are like a genetic
> analysis of your code - they can show a predisposition
> towards certain kinds of bugs.
>
I'll buy that. The number of people who can do an effective review is more limited, but it will certainly be valuable. I really like the idea of spec reviews, because every member of the team can participate, even if they don't have a speciality in that area--and every member of the team will have a better understanding, as a result.

> There's also the interesting issue of at what point a
> change is sufficiently complex that it requires a
> pre-emptive design review?
>
Yeah. That's a tough one.

> One of my favourite process vulnerabilities to look out
> for is how an organisation records concerns and things to
> deal with in future. One simple approach is to use your
> issue tracker and have categorisation that keeps these
> easily searchable and not adding too much noise into the
> main bug lists. The need to record such issues, and maybe
> design "bugs" can be a useful determinant when picking an
> issue tracker. I quite happily use Mantis categories for
> things like "Design Weakness", "Refactoring Opportunity"
> and so on.
>
Nice! I always have dozens of things on my todo list. "Things to improve" are always right up there. I like the idea of capturing them in organizational memory.

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 14, 2010 12:27 PM
Reply to this message Reply
> > Fair point. I tend towards maxims. "Time is money!"
>
> In general, I think that's fine. It's just that in this
> particular context, there is a real danger of perpetuating
> a misconception that unit-testing can guarantee
> correctness.
>
Point taken. It is not the end-all and be-all. I'll agree with that. I particularly liked the point that the original agile manifesto didn't consider usability! That's an equally important predictor of project acceptance.
:__)

Eric Armstrong

Posts: 207
Nickname: cooltools
Registered: Apr, 2003

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 14, 2010 12:30 PM
Reply to this message Reply
JBehave looks a lot like Cucumber. Haven't used that yet, but things like JBehave, Cucumber, RSPec, and Rake are the reason I love Ruby--the ability to create domain-specific languages to solve specific problems, in a way that leaves the power of the original, general purpose language at your disposal. (My continued appreciation to Fowler, for introducing me to that concept, and to Matz, for making a language that makes powerful, expressive, domain-specific elegance possible!)

Vincent O'Sullivan

Posts: 724
Nickname: vincent
Registered: Nov, 2002

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 15, 2010 4:04 AM
Reply to this message Reply
> I think the intention is to prove that an algorithm
> already known to work is coded without errors.

No, that is fundamentally wrong. The intention of testing is to show that the Java function, as coded, works as intended. The fact that the code does or does not appear to resemble an algorithm that may or may not work is irrelevant.

Vincent O'Sullivan

Posts: 724
Nickname: vincent
Registered: Nov, 2002

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 15, 2010 4:49 AM
Reply to this message Reply
> If we apply this to the prime numbers example: 2
> tests are of course not enough.

How do you know when the tests are "of course" enough, particularly since most functions are rather more complex than the given prime numbers example?

Fred Finkelstein

Posts: 48
Nickname: marsilya
Registered: Jun, 2008

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 15, 2010 4:57 AM
Reply to this message Reply
>> How do you know when the tests are "of course" enough, particularly since most functions are rather more complex than the given prime numbers example?

An expert would have a feeling for it: I think a mathematician would also test Palindromic prime, Mersenne prim, Motzkin prime, Gaussian prime, Eisenstein prime and so on (by the way, I am not a mathematician, I took it from this link :)

http://en.wikipedia.org/wiki/Category:Classes_of_prime_numbers

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Reviewable, Runnable Specifications Ensure Software Quality Posted: Jun 15, 2010 12:13 PM
Reply to this message Reply
> > If we apply this to the prime numbers example: 2
> > tests are of course not enough.
>
> How do you know when the tests are "of course" enough,
> particularly since most functions are rather more complex
> than the given prime numbers example?

To be fair, it's a lot easier to know that you don't have enough tests than to know that you do have enough. But you are distilling down my point into a single statement that I think is useful.

Even though you didn't ask me, my answer is that you can't know that you have enough tests unless you can test all inputs and validate the outputs. Even then, you can't be sure your tests are correct. It's a "turtles all the way down" problem.

As Eric (I think) noted, tests can't prove the code correct, they can only prove it wrong. It's much like the modern philosophy of science. You have a theory that you test, the more tests that don't prove the theory wrong, the more faith you have in the theory but it is never proven to be correct.

Flat View: This topic has 51 replies on 4 pages [ « | 1  2  3  4 | » ]
Topic: The fate of reduce() in Python 3000 Previous Topic   Next Topic Topic: Article-Recommendation Service?


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us