Tuesday, 4 August 2009

Building scientific tools that are actually useful

[caption id="" align="alignleft" width="300" caption="Photo by flattop341"]Photo by flattop341[/caption]

Lots of scientists write bits of software to get things done.  Sometimes they offer to give someone else (a collaborator, student, postdoc etc) a copy of some of their code, to help that person out.  Sometimes a given piece of code is useful enough that it gets handed out multiple times, and so starts to look a lot like a publicly-available scientific software tool.

That's great, but think about what could have just happened, back in the first sentence of this post.  A scientist wrote a bit of software to get something done.  Not "a scientist developed a robust, well-tested software tool".  Maybe it was, but maybe it was a knocked-together, prototype-y little chunk of code that was only meant to be used once.  And now suddenly that prototype is in widespread use.  We hope this fills you with horror!

The problem here is that this is a way that a prototype can end up being distributed as if it was a finished product.  But it doesn't have to be this way; in this post, we're going to discuss the considerations of producing good scientific software tools.

A lot of scientific code can reasonably be called prototypes.  As we've discussed here, this is in the nature of science and scientific discovery.  But it does mean that this same code is less suitable for repeated distribution, simply because it suffers from the lack of robustness that is intrinsic to prototypes.

Testing, testing, testing...
The way round this is testing.  And lots of it.  A lot of scientific code is less well-tested than one would want in a (prospective) software tool.  This is not only the prototype effect, but also the time pressure to get your science finished and written up.  This can be fine if the code is one-use only, because you can make cross-checks etc for that one case.  But as soon as you start to think of your code as being a tool, it needs to be tested in a much wider range of contexts to make sure it also works there.  We try to pay attention to the bugs we find at various stages of code development and it's amazing how often a test that "will definitely be fine" finds an unexpected bug.  Be careful!

Is your code user-friendly?  Often, the answer for scientific code is a resounding 'no'.  This is because it takes time and effort to make code user-friendly and if you're the only one using the code, you're familiar enough with it that the user interface can be pretty basic and clunky, perhaps even at the level of hardwiring control variables and just recompiling each time you change them.  This is perfectly okay if you're the only user, but a new user is going to find this really difficult to use.  If you're giving your code to other scientists, you need to create something more user-friendly.  They'll thank you for it!

Is it worthwhile?
Yes! With the caveat that your tool is scientifically useful in the first place, it's very valuable to both you and the scientific community.  A good software tool is an example of a do-once, benefit-many-times activity.  While you only need to write it once (plus maintenance, of course), many people can benefit from it.  And with every benefiting person, you're adding a little something to that scientific field.

The value to you comes in two forms.  Firstly, there's the kudos and reputation you earn by providing a really great scientific tool.  Scientists are very happy with anything that saves them time/effort and makes their science easier to advance.  Secondly, you also get to use the tool (!).  The tool that now has a nice user interface and has a number of other users that are providing useful feedback, both with bug reports and also ideas for new features.

Supporting your users
If you're going to make a good scientific software tool, you need to support your users.  Provide some documentation, at the very least a "how to" guide to get them started.  Publish a reference paper describing the method and the tool (some journals now accept 'software' papers, or you can write a methodology paper if the method is novel).  And make yourself available (typically via email) for people to ask you questions.  And answer them!

You should also strongly consider releasing updated version from time to time.  Incorporate bug-fixes and perhaps new features, particularly in response to the feedback you're getting from your users.

In conclusion...
Building a really good scientific software tool is a very valuable thing to do.  It contributes a lot to the scientific community and it earns you a lot of kudos because you've done it so well.  So get building!


  1. I completely agree with you that making a robust, user-friendly, and well-supported tool is a good thing to do (provided that what the tool in useful in the first place). The big question is thus why do so few people actually make such tools?

    I think there are several reasons to this:

    1) Skill sets. The people who are able to make the most useful tools are often those with a deep understanding of the scientific topic. However, they are often not good programmers.

    2) Interests. The people who make the most useful tools are usually those who have a problem at hand that they want to solve. Fast. When they have made a hack that gets the job done they are happy. They are fundamentally interested in solving scientific problems, not in doing software development.

    3) Rewards. Making a good tool will likely earn you kudos and reputation. But will it earn you tenure?

  2. somehow extending reason 3 from Lars: Scientific software development is risky! The time you spend on making you software general, intuitive, robust and easy to use is most likely time you do not use for your research. This only pays of if a user-base or community develops around your code, but Building a community further requires time and skills.

  3. [...] Building scientific tools that are actually useful (Programming for scientists) [...]