Testing…the surgeon’s approach

Posted by

I played a lot of volleyball in a bygone life 🙂 and subsequently ruined my knees to the extent that I needed surgery. I got a shock when the surgeon (after a series of x-rays and checks) said to me: “Of course, we’ll only know once we’re in there”.

So here’s a body part (a knee) that’s had hundreds of thousands of years to evolve, so you’d expect that knees are pretty much the same world wide, yet an experienced and qualified surgeon puts the “we cant be 100% sure” caveat before chopping me open.

I wish we could apply the same process to testing of IT software.  I remember reading about the mantra of “its harder to fix bugs once they’re in production” and that’s certainly true.  However, somewhere along the way, that became the justification for test cycles being incredibly long and mind-bogglingly detailed.  If my Finance software can’t balance the books, then yes, that’s a big drama.  But if the “Save” button is a shade of blue that didn’t match the design screen shots – is it really worth holding back the production implementation? There are two problems with a mantra of “we will find every possible defect”:

1) You can find defects in software almost ad infinitum.  They just get less and less severe, and your testing cycle bottlenecks you entire IT department.

2) You create a false confidence in the testing.  “Hell, if we spent 12 months in testing, then we’ll never find any bugs once we go to Production.”

So I say – Why not take the surgeons approach?  Sure, we’ll test the product to a reasonable level of satisfaction, but we’ll readily accept the fact that the ultimate test ground is only Production.  “We’ll only know when its in there”.


  1. So testing is still a huge bottleneck? Some things change, at XXXX most things stay the same 😛

    Connor – I’ve edited the company name in the interests of good blogging ethics

  2. My present employer has eliminated the testing bottleneck completely – we’ve developed in production environment for 18 years 😉

    Yes, we have made mistakes that have cost the company money (also sometimes big money), but the savings over 18 years have been manyfold the loss. When boss gets an idea for something that helps the business, we just code it, and if users start calling in with “help, you broke the system”, we fix it. Business gets new functionality sometimes hours after the idea is born – the business keeps ahead of competition that way.

    Not saying this works for everybody. I wouldn’t like the surgeon to say “let’s open him and see” without even first taking x-rays. Just saying to keep an open mind and look at the whole process, sometimes the gains you get from accepting a bit of risk far outweighs the downsides.

    I think it is a matter of finding the right balance of risk versus stifling rigidity. Our balance of relatively “high risk” would not fit a financial institution, but not everybody needs to have the same low level of risk as a bank, many would probably benefit from accepting just a slightly higher risk (without necessarily doing our go-for-broke cowboy coding method 😉

  3. Interesting observation. I think it is an observation about creating a varying matrix of risk-reward net wins. Okay – I know exactly what I meant by that but unless you already know what I mean it probably requires some explanation. It is very much akin to Cary Millsap’s “Clearly Thinking” explanation of useful service level statements of performance goals: http://method-r.com/papers/file/44-thinking-clearly-about-performance.

    For example: “The Track Shipment task must complete in less than .5 seconds in at least 99.9% of executions” is akin to “Testing must be sufficient to prevent a new feature from making sales impossible for more than 20 minutes.” (Presumably you could fix or back out a change.)

    But some functionality and systems are more akin to a heart pumping than your knee working well enough at low enough pain levels for you to walk about.

    If you’re introducing an improvement to a system that is like a pumping heart or has security and correctness concerns like the bank Kim mentioned, then you will have very rigorous testing requirements indeed. If the heart has stopped, your system has been breached, or transaction results yield wrong bank account results in a crisis your level of testing might be “it doesn’t generate an error on one simple test case.”

    All this because what I really wanted to say is: Don’t schedule your orthopedic surgeon to do your heart transplant.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.