Friday, 29 August 2008

Testing and debugging computer code - three key questions to ask

[caption id="attachment_113" align="alignleft" width="200" caption="Photo by alsidair"]Photo by alisdair[/caption]

This is the first in a series of articles on how to test and debug your code.  We start by with three key questions that you can ask about your code, that will guide you as to whether your code has been tested and debugged enough for it to be worth using.

At this point in the project, you've now written some code.  So, does it work?   If you've understood your objectives, planned carefully and written your code with craft and attention to detail, you can justifiably hope that your code does what it is supposed to do, but you don't know

Testing is checking explicitly that various features of your code do what they are supposed to do.  At the largest scale, this can be running test data through the whole code and looking for the (known) correct outcome or expected behaviour.  At it's most simple, it can be inserting PRINT statements in your code, or stepping through the code in a debugger, so that you can see what values particular variables take at various points in your code. More complex forms of testing involve writing code to automatically test your existing code does what you expect (unit testing). You can also give your program known inputs and check for correct outputs (smoke testing).

Debugging is what you do to fix the cause/s of whatever's not working in your code.  Usually, your code will have either just failed a test ("Hmm...the age of the universe is clearly not 2 days!"), an Assert has failed, an Exception has been thrown or it will have just crashed.  In any case, you want to track down the causes and fix them.

Three key questions...

To keep things simple, you can think of testing as asking three key questions about your code.

  1. Does your code compile? 

  2. Does your code run to completion?

  3. Does your code do what you think it does?

Does you code compile?

The first question is straightforward and is quick to answer because all it takes is running the compiler. This will catch bugs like mistyped names, logical errors to do with functions and loops etc. Having the compiler catch bugs is a good thing, because by definition any such bug cannot get past the compiler and actually get executed.  A good habit is to compile periodically as you write your code; this helps you to find (and hence fix) bugs as you go, before they become too buried by the code you've written. Note that some languages (eg. interpreted languages) don't get compiled, while others might compile automatically at run-time (Matlab and IDL are examples of languages that do this).

If your compiler has the option to treat warnings as errors, then switch it on.  Warnings are often useful indicators that things are starting to go wrong.  A good example of a warning that can indicate problems are brewing is when you cast (change the type of) a variable from a type with higher precision to one with lower precision, eg. casting from a floating point number to an integer. If you do this by accident and ignore the warnings, then you might have just created a bug. Ben's favourite example of this is a university friend who spent 2 days trying to work out why his code was giving him odd results. He eventually worked out he was assigning his nice high precision Pi value to an integer. The compiler would happily do it, but from that point on everything was garbage.

Does your code run to completion?

Question two is (we hope) obvious.  You need to know whether or not your code actually runs once it's compiled.  If it doesn't, then you need to identify why and fix it (i.e. do some debugging).  If it does run then congratulations, it could still be producing garbage, but it does run all the way through. This level of testing can be extended to does the program fail as expected when I give it bad data (you are checking your data, right?).

Does your code do what you think it does?

Question three is a bit more subtle.  Just because your code seems to work (i.e. it runs to completion without crashing, generates output about as you'd expect and generally seems well-behaved), doesn't mean that it is doing what you actually need it to do. This is where the program must be tested against its requirements. This level of testing can be quite time consuming but is vital to make sure that your code really does what you think it does.

For example:
We want some code to calculate the 'y' axis of a straight line, given a gradient, intercept and x value.  Both of the following chunks of code will give reasonable-looking answers

y = m * x + c

y = m * x + FLOOR(c)

And yet the second one is wrong (because it is rounding down 'c').  However, both will run to completion and both will give numerical answers that might look reasonable at first glance.

What this boils down to is that you must test quantitatively the output of your software, in as many ways as you reasonably can.  If your code is processing data, you need some test data where you know what the answer should be (either simulated data or real data where you've established the "right" answer in another way).  If it's simulating something, try reproducing the correct results for specific cases, perhaps where the correct answer can be determined analytically.  Ultimately, there is no substitute here for expert knowledge of the project so that you understand what's important about the results and how it would be sensible to test them.  Always remember that what you're doing here will allow you to trust your results when you do science for real.

Testing and debugging are vital parts of any software project.  Without them, you don't know that your code actually works, even if you think it probably does.  Don't assume it, test it!

1 comment:

  1. Hi,

    nice post, and I agree 100% with what you said. I would just like to add that, if possible, one should begin by running standard tests from the field of application of the code for which correct results are known (e.g. shock tube tests in hydrodynamics). Then one could design more elaborate tests if necessary.