Thursday, 17 December 2009

Scripting for science papers



[caption id="" align="alignleft" width="300" caption="image by marco annunziata"]image by marco annunziata[/caption]

Scientist-Programmers write a lot of scripts.  It's part-and-parcel of "trying stuff out", it's a quick way to get some number crunching done on those data, and it's very useful for generating the figures and tables that you need for that paper you're writing.  In this article, I give a quick once-over of some of the things I've learned over the years about using scripts as a scientific tool.

A bit like prototypes...
Scripts share some characteristics with software prototypes.  Your aim is typically to get an answer quickly, writing code that doesn't (necessarily) need to be very reuseable.  There can also be a learning element here, if you're trying understand more about exactly how to solve a given problem.  This means that you'll be subject to many of the same considerations as in a prototype.  Writing quick-and-dirty code in exchance for speed is okay here, provided you can test enough to be confident that you can trust the results it's generating.


You *will* want to re-run these at some point
Often, you'll be writing a script to run a one-off analysis.  Perhaps there are enough stages involved that it's easier to handle by writing down in this way - for example, running an MCMC clustering analysis on some genetic data, summarising the results into a single 'average' clustering partition, then using annotation databases to search for patterns of biological function.  All pretty straightforward stuff (and essentially just a set of modules, run in sequence).  Despite being a nominla one-off task, an important lesson I've learned over the years is that it's surprising how often you'll come back to a script months (or even years) later and need to use it again, either for the same task or a related one.  This can happen for a number of reasons.


  • because you're returning to an old project

  • you're responding to referees' comments on a paper you wrote

  • you're working on a further development of a previous project

  • You might also simply have come across the need to do a similar set of tasks for a completely different project.


Whatever the reason, you will thank yourself if you've taken the time to write some comments and keep the code fairly legible and literate.  This doesn't take much time to do as you're writing the script, but will save you huge headaches in getting restarted after months working on other things.

Turning your script into proper software
Sometimes your script will turn out to be useful more than once.  It might even be useful enough that you end up using it regularly and perhaps other people start asking if they can have a copy.  This is great, because you've made something useful!  But at this point, you might want to consider turning your script into a proper piece of software.  My suggestion for this is to treat your script as a kind of prototype, meaning that you should start afresh with the planning, coding and testing for the proper software.  This is extra effort, of course, but by definition you've identified a case where it'll be effort well spent.


In conclusion
Scripts are, of course, a very useful tool in the toolbox of any programmer.  This is perhaps especially true of the Scientist-Programmer.

1 comment:

  1. It's a lot more easy said than done ;-)

    I find it as a difficult balance. Sometimes I tend to spend to much time commenting what the code pieces are doing and then I suddenly get an awesome new approach/idea and implement it straight away - deleting without look back (it's hard but necessary sometimes).

    So my advice would be: make it work first (proof-of-concept) and only comment on things that are new or difficult to make sense of. When the code is working and you are satisfied then take your time to make good comments.
    The last thing I must admit I don't always do since a code is never truly done (always room for optimization) or I quickly start a new project - it's always easier to give advice than doing it yourself.

    ReplyDelete