21st Century Scientist: The joys of Literate Programming

[caption id="" align="alignleft" width="300" caption="Photo by timbomb"]

[/caption]

Software code is machine-readable. But sometimes it's not very human-readable.

for i=1:n
d = function1(i)
f = function2(i)
a = function3(d)
aa = function4(d)
aaa = aa^2
for j=1:nn
d(j) -= a
d(j) /= aa
if aaa > vt, f(j) = TRUE, end
end
function5(d, i)
function6(f, i)
end

(we've actually seen real code where the author used variable names 'a', 'aa' and 'aaa'. There's no helping some people...)

This is a Bad Thing because at the very least, one human (the programmer) must be able to read and understand the code. And the more effort they need to expend in understanding the code, the less they're able to focus on other important aspects of the code, such as structuring it well.

This leads us to the concept of literate programming. This is the idea that human readability should be a primary consideration when writing your code. This then makes that code easier to understand and hence work with. In practice, there are a number of steps you can take to do this, including those mentioned above. Writing comments in your code is also valuable in this regard as it allows you to add detail in addition to the literate code itself. In essence, all the pertinent information about what the code does and how is placed in one file - the one also containing the code. One could regard the aim of literate programming as being to produce code that a programmer can understand, maintain and/or make informed changes to, in a reasonable time, without the help of the original author. We'd extend this to include the programmer being the original author after a break of six months. Your brain won't remember many of the subtleties of your code after that long a break.

Note that literate programming is actually more developed a programming concept than we're presenting here. Wikipedia has an interesting article on literate programming.

Comment your code
Literate programming 101 is to write comments in your code! You should be doing this anyway, so this is simply one more reason to get into this (very good) habit. In the context of literate programming, comments give additional explanation as to what a particular section of the code is doing. This can be especially useful if for example you're implementing a mathematical or statistical method. You can write the mathematical details in the comments, so that you have a bit more freedom in writing your code efficiently. And don't just use your comments to explain the obvious (ie. add a to b), because the code itself does that (by definition of what a comment is, there will be computer code alongside it!). Rather, use comments to explain why you made a certain choice or why you are implementing in a certain way, especially if there are limitation to what you are doing. Even if such statements are obvious to you right now, they may not be in six months' time or to another person.

Choose meaningful variable/function names
We're amazed whenever we find code that doesn't have this. Why would you not choose variable and function names that mean something? The key point here is that you will need to know what a given variable is or what a given function does. There are two ways to find it out; either you have to look through your code to work it out, or the variable/function can tell you itself because it has a meaningful name. The number of small savings of effort you make by picking meaningful names really adds up over time. And wouldn't you rather work with code that looks like this:

//HERE, WE NORMALISE THE INPUT DATA TO REMOVE UNWANTED INSTRUMENTAL EFFECTS
for i=1:nDataItems
currentData = GetDataRow(i)
currentFlags = GetDataFlags(i)
dataMean    = Mean(currentData)
sigma         = StandardDeviation(currentData)
variance     = sigma^2
//NORMALISE THE CURRENT DATA TO ZERO MEAN, UNIT VARIANCE
for j=1:nFeaturesInData
currentData(j) -= dataMean
currentData(j) /= sigma
//IF THE VARIANCE IS TOO LARGE, FLAG THIS DATUM
if variance > varianceThreshold, currentFlags(j) = TRUE, end
end
//COPY THE CURRENT VALUES BACK INTO THIS OBJECT
SetDataRow(currentData,   i)
SetDataFlags(currentFlags, i)
end

Do it as you go along...
While writing big, exhaustive documents is a real pain, commenting one's code and providing short README documents (just a plain text file will do) is generally quick and easy to do if you do it as you go along. Because at this point the task is fresh in your mind, plus you are working from the pseudocode we've discussed previously. (indeed, well written pseudocode can become a significant part of the commenting).

Self-documenting code
When the above are combined with choosing meaningful variable and function names, your code can even tend to become self-documenting. Indeed, there are packages (such as JavaDoc, Doxygen) that can be set up to automatically extract documentation from appropriately written code. This requires a touch more discipline, but can be well worth it on multi-person projects.

In conclusion
At one extreme, literate programming can lead to self-documented code and the automated extraction of documentation. But even if this is overkill for your project, we think everyone should pick meaningful names, pay attention to keeping the layout of their code easy to look at, and comment their code! Get into the habit of doing this as you go along and you'll find your code far easier to work with. And if you're lucky, you might convince your colleagues to do the same so that working with their code is easier as well!

5 comments:

it, it...: Dirty scripting27 January 2009 at 16:17
[...] http://www.programming4scientists.com/2009/01/the-joys-of-literate-programming/ just an interesting article. And other one http://www.codinghorror.com/blog/archives/001216.html [...]
links for 2009-01-29 « pabloidz29 January 2009 at 12:02
[...] The joys of Literate Programming Programming for Scientists (tags: programming) [...]
DeRien3 April 2009 at 15:19
When i was in college, i worked as a counselor to students in an introductory programming class. My favorite (worst) program looked like this (really):

two = 1;
three = 4;
four = 17;
...
seven = three + four;

Needless to say, the code was completely uninterpretable. Friends pointed out that the writer wouldn't ever run out of variable names.
Rich3 April 2009 at 19:28
Wow - that's....an amazing approach to coding! Thanks for sharing that example, DeRien.
Daniel18 August 2009 at 21:40
I think you should expand the section on the tools to extract documentation. Using such tools is helpful even if you are the only programmer...

I use NaturalDocs (http://www.naturaldocs.org/) in my projects, and it works like a charm, even with bash scripts :-)

Since NaturalDocs does not require a strict Syntax to be followed, it comes natural (nomen est omen).

Here is an example:

# Function: abs
#
# Calculate the absolute value of a number
#
# Parameters:
# num - a real number

Thanks a lot for the good texts - I can really relate to much of it!

Cheers,
Daniel

21st Century Scientist

Monday, 26 January 2009

The joys of Literate Programming

5 comments:

Search This Blog

.

Recent Posts

About me

Topics