Monday, 22 December 2008

Seven ways to test your number-crunching code

[caption id="" align="alignleft" width="240" caption="Photo my alisdair"]Photo by alisdair[/caption]

Testing your code is important.  Pretty much vital, in fact.  This is because if you don't test your code, you don't know if it really works.  So an integral part of any software project is to come up with a good set of tests for your code.

Scientific code often involves numerical analysis of some kind.  Whether you're estimating an integral, performing a statistical analysis on your (noisy) data, or using a Monte Carlo technique to simulate something, your code is producing output numbers and you are going to need tests that tell you whether these numbers are correct.  This testing can be quite challenging, because you'll be faced with questions like "how do I know my results aren't out by 10%" or "How will I know if one of my equations is missing a factor of two".  To that end, here are some ways that can be useful in performing these tricky but essential tests.

Putting in print statements
An oldie, but a goodie.  There will (hopefully) be places in your code where you know what a certain value should be, or at least what sort of value it should take (should it be of order one or of order a million?).  If you put in an appropriate PRINT statement at that point, you can see if the actual value is reasonable or not.  And if it isn't, additional PRINT statements give you a way of trying to trace down the line on which your value first becomes unreasonable.  This can sometimes lead to pages and pages of output, that whizz by before you can read it.  In this case, see if your language-of-choice has a PAUSE or STOP statement. Inserting one of these after each print statement will let you see each output in turn.

Oh, but try to remember to remove them afterwards.  Cluttering your code and generating on-screen garbage is less good.

Watching the results as they come in...
One really intuitive way to display your results is to do so graphically.  An even better way is to do this as your results are generated, updating them if they change.  This shows you in real-time what your code's doing and you can often get a great 'feel' for what looks right and wrong.  It's also a nice way to see what changes when you change the code.

For example:  if your code is simulating the behaviour of a physical system, then generate periodic images of what the system currently looks like.  While slowing your code down, the effect will be an animation of the simulation, which should allow you to see if things are going badly wrong.

Another example:  Running any kind of Monte Carlo analysis (eg. MCMC statistical methods), generate plots of your output quantities and watch them evolve from their starting state towards a (hopefully) sensible end point.

Putting in known solutions

A great test is to give your code inputs for which you know what the correct answer should be, for example from an analytic mathematical solution.  When such cases exist, it means you can test your code very precisely and can potentially guard against all sorts of bugs.  The downside here is that there are often no such solutions available; indeed, you may well be building numerical code precisely because analytic solutions are not available!  It's also important to be careful in case the known solutions you do have represent very special cases for your code's task.  You need to take care that your code won't fail under more general conditions.

Putting in toy data
By "toy data" we mean simulated data that are unrealistically simple, but that are useful for spotting really glaring errors.  There are a whole class of bugs that this type of test can catch, such as missing factors in your equations or if your code is computing grossly the wrong thing at any point.  The advantage of toy data is that you can (hopefully) construct it in such a way that the output you expect should be clearly defined.

For example, if your code is designed to cluster some time-series, a good toy data-set is to have two completely different time-series (say a sinusoid and a parabola), then have 10 copies of each.  So, you have 20 time-series comprised of 2 sets of 10 identical series.  If your clustering algorithm can't find these, you can be pretty sure that something' wrong!

Putting in synthetic data

Even better than toy data is to generate full-scale synthetic data with which to test your code.  The distinction we're making here is that synthetic data are an attempt to be genuinely similar in character to the real data (or at least, what you expect the real data to be like).  This can be a large project - we know of at least a couple of cases where someone's PhD was essentially to build the data simulator for a space telescope.  Other examples are less grand, but the aim remains the same: make the synthetic data as realistic a possible, so that it tests your code in exactly the ways you will ultimately use it.

A quick list of things you may wish to include in synthetic data:

  • signal generated from a physically realistic model of the data

  • any noise inherent in the physical system you're measuring

  • instrumental noise (including all those nasty effects you wish weren't there)

  • If there are different ways that the data may be gathered (eg. telescope observing strategies), then try to simulate these faithfully

  • and do all of this multiple times, using different random seeds for your simulations!

Putting in real data
The best way to see if your code can handle all the oddities of real data is to run it on...some real data.  Real data are always more messy and less convenient to work with than toy/synthetic data, so it's very important to see how your shiny, well-crafted code reacts when you feed in warts-and-all real data.

A couple of examples:
- data from an astronomical satellite might have oscillating noise characteristics, due to unexpected ringing in the read-out electronics, as well as having random spikes in the data as cosmic rays hit the detectors.
- gene expression measurements from a microarray chip have highly non-Gaussian noise characteristics, as well as spatial artifacts caused by scratches and even thumb-prints on the chip itself.

The problem with using real data to test your code is that it's very hard to know what the real answer is - generally, that's the point of your science project, which you've by definition not finished yet!  There are some ways around this, which generally involve using other sources of (trusted) knowledge to establish what the real answer is likely to be in a specific, restricted case.

For example:
- when observing astronomical sources, it may be possible to define a small sample of objects that have been well-measured before (by other telescopes)
- when clustering genes on the basis of their expression profiles, the annotations of the genes in each cluster can be examined for over-represented terms that would indicate that those genes are indeed working together.

Comparing to someone else's code
If you have access to someone else's code that performs (in at least some cases), the same task/s as your code, then you can use this to bootstrap-test your code.  The idea here is that if the other code is well-tested, you can justifiably hope it gives the correct answer.  Therefore, it's sensible to compare the results from your code with those from the other code.  Note the phrase here "if the other code is well-tested" - this is very important here and you need to convince yourself that this really is the case.

Of course this does beg the question, "if their code does what you need, why not just use it and not write your own?".  This is a valid question, although there are certainly cases where it is justifiable to write your own code.  For example, you may be writing a faster version (re-coding in C++, maybe), or you may be using the method as a starting point but planning to experiment with many new algorithms, in which case it's a good idea to have your own version so that you know in detail how the code works.

In conclusion
Testing your number-crunching code can be challenging, because there are lots of ways to hide errors in numerical outputs.  Precisely because of this, it's vital to test your code enough that you can be confident in the outputs it generates.  After all, imagine presenting your results at a top international conference and being asked asked, "are you sure those numbers are correct?".  It would be nice to be able to definitively answer, "Yes".


  1. I would add: if someone else is going to use your code, have them testing it! Let them misuse the program as worst as they can... and bugs will come out!

    It happened to me that I wrote some routines for the analysis of a time series. Well, it worked perfectly on all my data, but when someone else in the lab tried to use it it didn't work anymore, giving weird results. Well, the problem was I was assuming (never do that!) that the time series was starting from time 0. This was always the case with my data, but not with my colleague's...

  2. I completely agree! Other people can be very creative when it comes to breaking your code and this is a Very Good Thing, because it's a form of testing :-)