Monday, 10 November 2008

Surviving legacy code

[caption id="attachment_337" align="alignleft" width="300" caption="Photo by hryckowian"]I will survive![/caption]

During Rich's PhD, he was presented with some code and told to use it as the basis for a project he was working on.  This generosity turned into a major headache for a number of reasons, not least because the (somewhat sparse) comments were written partly in English and partly in Portuguese (!).  Many people encounter legacy code during their programming career and the scientist-programmer is particularly vulnerable because of all the "challenging" code that exists in science. 

What were they thinking?!

Not an exclamation of despair (although this can happen).  One of the first problems you will encounter with any legacy code is that you don't understand how it works because you didn't go through the process of writing it.  For example, the reason we write prototypes is to learn more about how best to implement our code.  If someone else did all of this, then you don't have the benefit of this experience.

There's no quick fix for this, unfortunately.  Experience has to be earned.  However, things that will help include spending time reading through the code to understand what each part does, reading any papers/documents that describe what the code does, and talking to the original author/s.  And you will learn a great deal once you actually start working with the code.

Sparse comments

One way to better understand what code is doing is to read the comments.  This makes it a real problem when there aren't any.  Or they're not written in a language you understand (whether that be Portuguese, a cryptic shorthand or whatever).  If you're stuck with sparsely-commented code then you may just have to tough it out, unless you are able to persuade the original author to go through the code and add more detailed comments.  This is expecially helpful if you're able to watch over their shoulder as they do so.  If you can persuade them to do this, however, you should learn a lot so it's well worth the effort.

No testing

Code that you didn't write, that is poorly commented/documented and has cryptic variable names is unlikely to come with a comprehensive test suite. The code you inherit probably works (you have inherited it for a reason) but as you start to learn it and change it then you have to make sure it will still do what you expect. Creating a set of tests for the code may seem like a waste of time, but without them you can't be sure that any changes you make aren't creating more bugs. As you dig deeper into the code and begin adding new features or even just cleaning it up, you'll begin to exercise the code in ways that were probably never planned for and so it is likely that new bugs will appear. Having a good set of tests will allow you to catch them earlier and will save you a lot of time.

Poor coding

If you're lucky, the code you inherit might be well-written.  If it's not then you have a judgement to make.  Re-writing the code in a better form will be beneficial in the long term, but it may cost you a lot of effort because in essence you're choosing to throw away the old code entirely, having only used it as a guide in building your own.  The other extreme is to decide to just work with what you have.  This is fine, as long as you're aware that the changes you make are likely to take you longer because you have to battle against the legacy code.  These decisions will depend on how extensive the changes are that you need (or are likely to need) to make.

A third way is to make changes to selected aspects/parts of the code while leaving others unchanged.  You might consider targeting the only easy-to-effect changes that will make your life easier, while accepting the limitations of the more ingrained aspects of the code.  A good example of this would be changing unhelpful variable names to something more meaningful using find-replace in your code editor.  Using a find-replace is important here as each change will be immune to typos (which will happen if you try this by hand).  It's also a lot quicker if you're replacing a variable a large number of times.

Other changes you might consider include adding your own comments (as you learn more about what the code does), tidying the code up so that it's easier for you to read, and relocating chunks of large functions to their own sub-function - it's amazing how many 500-line functions one encounters in legacy code!

A language from the mists of time...

The older the legacy code (or the skills of the person who wrote it), the more likely that it might be written in an old computer language.  This can often happen when the older members of a department have their favourite language and stick with it for years and years.  This is fine at some level - FORTRAN 77 is still a very fast language when it's written well, but it does lack a lot of features of more modern languages (Object Orientation, anyone?).    Perhaps the most significant problem that using an old language can present is a lack of support.  Hopefully there's at least one person you can ask about the language (the original author), but there may be fewer up-to-date web pages on a 40-year-old language than there are on Java or C++.  And there are unlikely to be many tools like development environments available.

Sadly, your only options here are to accept it or re-write the code in a new language (which could be a lot of work).

To re-write or not re-write?

If you're given legacy code to develop, you're going to be making at least some changes to the code (by definition). So, the question becomes how much should I change?  This is another situation where you must make a cost-benefit analysis.  It might be that you only need to make small changes to the code in order to get the effects you require - in this case, maybe you can get away with the minimal set of changes and not worry about improving other areas of the code.  At the other extreme, you may need to change fundamental things about the structure of the code, for example to accommodate  a new format of data, so you might need to consider re-writing large chunks of the original code.  And of course you may well find yourself somewhere in between these two extremes.

It is surprising how little code you have to change within an existing program before it will be quicker to re-write from scratch, Robert Glass, in his book 'Facts and Fallacies of Software Engineering', argues that if you change more than 25% of an existing program then it will be quicker to rewrite it from scratch. This surprising result was discovered by NASA when they analysed their flight dynamics software and is thought to come about because a program designed to solve a particular problem will, in general, fit that problem very closely preventing modification to solve other, even closely related, problems.

While your decision will ultimately depend on the nature of your project (and the amount of time you've got available), we suggest that it's often sensible to try to minimise the changes you make.  Make the changes that you must make, and strongly consider any "easy" changes that will really help you out, like find-replacing variable names.  But make sure you can run tests frequently, so that you know if you break something!

In conclusion

Not all legacy code is bad!  If you're lucky, any legacy code you work with will be well-written and make your life easier.  However, if this isn't the case then understanding the task you're facing, being methodical and following the above advice will make your life a lot easier.  And remember that any code you write may someday become someone else's legacy code, so write it accordingly!


  1. Nice article (as all of the others in this blog).
    I am often surprised by the fact that people use code without testing to calculate their results, and later nobody during the peer-reviewing process says anything about it.

  2. Glad you enjoyed the article :-)

    I agree entirely about testing! My experience is also that testing is ignored far too much. I think it's *really* important to have confidence that your code is doing the right thing.

  3. This is a time that takes place immediately after the ceremony,
    and you may even want to ask questions like: Who will be supervising and troubleshooting on the day you get
    married. Set in the surrounds of the Southern
    Hemisphere largest display gardens, you can never ask
    too many questions. From flowers, lighting, and even marbled countertops really make a wedding venue queens statement
    at your next event.