Monday 20 October 2008

The basics of...pseudocode

[caption id="" align="alignleft" width="300" caption="photo by WorldIslandInfo.com"]photo by WorldIslandInfo.com[/caption]

Pseudocode is the code you would write if a computer was as smart as a person. It is a way to capture the logical rigour of programming language, without the messy syntax, but maintaining the flexibility of human language. Pseudocode is therefore a cross between full blown code and a purely human-focused description of what the code will do. It is a way of laying out the structure of your program, algorithm or any other complex piece of code so that you can check your logic is clear before investing a lot of time writing the code.

Human-readable "code"
In the design/planning stage we're writing down, in the human language of your choice, what our code is going to do.  When we come to actually write the code we're writing down, in a computer programming language, what the code will do.  What would be really useful is to have a halfway house between human-readable and machine-readable, to make it easier to get from one to the other.  This is pseudocode.

The idea of pseudocode is that we write down step-by-step what our code will do, in the style of a programming language, but we write it in a human-readable form.  Ideally, pseudocode is written in the file where the code will eventually reside and more or less becomes the comments of your code as the code is written.

Example of pseudocode to layout the structure of a program: (using // as the comment statement)

//Read in the data
//run the basic data processing algorithms here
// - clip outliers
// - remove high noise regions of data
// - estimate noise level of remaining data
//
//Pass data to function implementing Bayesian signal detection technique
//
//Use results from signal detection to generate plots of how well the method has performed
// - output the plots to a PDF file


Pseudocode is a great way of actually trying out the layout and structure for your code before you start actually writing it.  It has no requirement to have correct syntax and doesn't need to actually run; all it has to do is make sense to you and fulfill the requirements and objectives of your project.

Building the foundations without worrying about syntax
Pseudocode is great way to start laying out the requirements and features we gathered from the planning stage without getting caught up in the details of implementation. You'll find that once you start it guides your mind to think about different aspects of your design and planning.  Sometimes it will become obvious that there are dependencies you hadn't thought about (perhaps you aren't calculating a value by the time it's needed).  Or you may spot a better way of doing something that you'd originally thought of, which is good because it means you're developing better understanding of what it is you're trying to do.  Or you may realise that you have several very similar functions that you can combine into a single, more general one, thus saving time writing and testing the code.

Pseudocode is also a great way fill in details from the top down. The above example captures the code and data flow though the program in 8 lines without getting lost in any details. The next step would be to drill into each of the steps and create pseudocode for them.


In the above example we know that we have to 'clip outliers' before we 'remove high noise regions'. Perhaps initially you think that removing outliers is going to be as simple as removing values above and below a threshold. The pseudocode could look like this:

//function removeOutliers(lowerBound, upperBound)
//
//foreach dataPoint in dataSet
//    if dataPoint outside of range
//        remove dataPoint from dataSet
//


This is great until you realise that lowerBound and upperBound have to come from somewhere. You'll need to calculate the bounds first and then remove them. 'Clip outliers' has now become 'calculate outliers' and 'remove outliers'. You have a more detailed idea of how the program is going to reach its goals and you've only written 20 lines if 'code'.


Using pseudocode to define interfaces


Pseudocode really excels in helping you work out the interfaces in your code. The interface to a class, module, method or function is the way that that piece of code can be used. The 'removeOutliers' function above takes two variables as inputs and returns nothing. The problem is that it changes another, global, variable (dataSet) that user of the function doesn't know explicitly know about from the interface to the function. If this function was a method on a DataSet class then it would be clear what it was working on but if this is simple function then your function has a side effect that isn't clear.
A better interface, and name, would be:


removeOutliersFromDataSet(dataSet, lowerBound, upperBound)

the functions interface is clearer and it has no side effects.

One of the reasons why interfaces are so important is that when they are right a part of your program can be reworked without it affecting the whole. Continuing with our removeOutliers example, if the data set has become to big to be handled in one go you might decide to break it up into chunks and process them separately. In the first version removeOutliers would have to be changed so that it knows how to deal with multiple datasets, and if the program has lots of code that uses the one global dataset then the change would have to be made in lots of places which will take longer and is more prone to bugs. You have created a dependency between otherwise separate code by using the global varaible.
The second version, removeOutliersFromDataSet, doesn't care how big the dataset is as long as it can get dataPoints from it. The change to multiple datasets would be a lot easier as most of the code wouldn't change. This is faster and much less prone to bugs.


In conclusion
Pseudocode is a great way of going from a list of requirements and features and starting to turn them into a working program. It combines the brevity and flexibility of human language with the logical rigour of a programming language. We've covered a few uses of it from laying out the program flow and using it to help define interfaces and reduce/remove dependencies but you should use it where ever you feel the need to add some clarity to how you are thinking about your code.

2 comments:

  1. how to design web page and how to write program succesively

    ReplyDelete
  2. [...] programming languages were less than ten years old, so this may have been a reason not to use pseudocode to express the algorithms. And it would be years before asymptotic complexity would become of [...]

    ReplyDelete