Monday 17 November 2008

The basics of...Matlab

[caption id="" align="alignleft" width="300" caption="Photo by kennymatic"]Photo by kennymatic[/caption]

It's important to know the basic features of any language you're programming in.  We don't just mean the syntax or whether the language is dynamically or statically typed (although these are very important); you also need to know what the language is good for, what it's not so good for (and why), and any particular quirks it has that might otherwise catch you out.  To this end, we present the "Basics of..." series of posts about different programming languages.  We'll assume that you know the basic syntax of the language in question (there are almost always lots of good ways to learn this) and try to present the things you'll need to know right after that.  We'll start with Matlab.

Matlab is a programming language that's widely used in science.  It was was originally developed to perform maths operations involving matrices, and has grown beyond this into a language in which you can develop code rapidly and that has loads of built-in library support for scientific-type analysis.  It can be a lot slower to run than compiled languages like C and Fortran, except when using certain built-in library functions (for example the libraries that handle matrix operations).  But for CPU-unintensive tasks, prototyping and as a toolbox for manipulating data and the like, it's excellent.  Matlab also provides a numerical computing environment, where you can type commands into the command line and manipulate variables interactively.  This is very useful both for testing out chunks of code and for manipulating data and performing quick calculations.

It's a commercial language, which means you have to pay Mathworks for a licence in order to use it.  There is an upside to this in that Mathworks provide technical support, should you encounter problems with Matlab.  Mathworks also release new versions of Matlab regularly (every six months, at the time of writing), which means periodic increases in functionality.

Matrix maths/linear algebra
Linear algebra is what Matlab was created for and it continues to be very strong for this.  It uses libraries like LApack and BLAS, which are standards as linear algebra libraries, and provides a very nice way to access those libraries, something that's not always the case in C and FORTRAN.


For example, efficient matrix inversion, which is a complex business that involves things like Cholesky decompositions, can be performed in Matlab by doing this:

invertedMatrix = inv(myMatrix);

Nice and simple!

Matlab also implements sparse matrices, which are matrices than contain mostly zeroes.  A 2D array is a bad way of handling these because of all the empty space you're storing.  Much better is to store the data in a more efficient data structure, but this then makes matrix manipulations more complicated to code.  Matlab's sparse matrix formalism provides a set of internal functions so that you can just use Matlab's built-in arithmetic, logical and indexing functions as you would with a regular 2D array.  This can be very useful!

Avoid FOR loops!
The Achilles heel of languages like Matlab is that FOR loops are slow.  This is because they don't benefit from the heavily optimised compilation like languages such as C, C++ and FORTRAN.  Happily, it is often possible to avoid this problem through the use of Matlab's vectorised functions.  These are built-in functions that perform a given operation on a whole array of inputs.  These functions contain the FOR loop, but it will be in compiled (C) code and will therefore be much, much faster than writing the loop in your Matlab code.  In many ways, this is a very elegant solution.  It's not ultimately as flexible as just being able to use FOR loops, but you should find that most of the time they work very well.


For example, if you wanted to add a constant to each element of an array.

The slow way:

for i=1:length(myArray)
myArray(i) = myArray(i) + const;
end


The quick way:

myArray = myArray + const;

A great rule-of-thumb is therefore to avoid FOR loops if you possibly can, instead using vectorised functions or other built-in functions.

Use the libraries
One of the great strengths of Matlab is that is has a lot of functionality built into it, in the form of libraries and toolboxes.


The core libraries are effectively just a large set of functions that you automatically have access to when writing Matlab code.  Matrix manipulations, Fourier transforms, sorting algorithms, plotting routines and any number of other operations are there to be used.  It's well worth browsing through the manual, just to give yourself a bit more flavour of what's available.  It's always nice when you stumble upon a function that does something you thought you'd have to write yourself.

Toolboxes are non-standard Matlab libraries that have a specific theme as to what they do.  Some of these have to be purchased specifically, so make sure you get the ones you need.  These cover a wide range of different topics, for example there are toolboxes for bioinformatics, econometrics, image processing, signal processing and parallel processing (to name but a few).  The image processing toolbox implements a number of the standard-and-powerful image reconstruction tools such as the Wiener filter and a regularised deconvolution filter.  The parallel processing toolbox can be very powerful when used on a cluster or a machine with multiple cores.  It provides language features like parallel FOR loops, which means you can build code that can use many cores/nodes.  If your algorithm is amenable to such parallelisation, this can be a very easy way to write parallel code.

Great for prototyping

Matlab is generally a quick language in which to program, because the syntax is forgiving and there are lots of built-in functions.  This makes it great for prototyping.  While this can present an additional level of project-complication if you then have to re-code into another language for the final version, there's a lot to be said for being able to prototype rapidly.  This is especially true if the science goals of your project include developing new algorithms and/or methods (see this example).


Plotting and graphics
Another advantage of a language like Matlab, that was built in part to manipulate data, is that it provides excellent support for producing graphical output.  Plotting is often possible using a single, simple line of code.


For example, we can produce plots using commands like this:

plot(xValues, yValues);                $2D line plot
hist(myData);                             $histogram
logLog(xValues, yValues);            $2D log-log plot
plot3(xValues, yValues, zValues);  $3D plot


A lot of the characteristics of the plot (line style, titles, colours etc) are also controllable via keywords, lists of which can be found in the online manual that comes with Matlab.

MEX files
Matlab provides a solution if you absolutely must use a FOR or WHILE loop, but your code is too slow: the MEX file.  MEX files are simply a way of linking C, C++ or FORTRAN code into your Matlab program, giving a way to make a FOR/WHILE loop bottleneck much faster.  Interfacing two languages like this can be tricky (although as full disclosure, we've never tried using a MEX file ourselves), but if there's one critical part of your code that needs improving in this way, it's nice to have the option.

Support for object orientation
Matlab supports object oriented (OO) programming, which is very useful if you have experience in writing OO code.  In our opinion, the OO implementation feels more like an add-on to the language than a core principle (which of course is true).  But nevertheless, we write our Matlab code to be OO as standard and find it to be perfectly functional in this regard.


Edit: As of release 2008a, Matlab also has a new implementation of OO.  We've not tried it yet, but it's certainly worth checking out.

Quirks of the language
Any language has quirks and Matlab is no different.  None of these are bad per se, and they're certainly not problems.  Rather, they're things of which the programmer need to be aware.  Here's a quick list.


  • array indexing starts at 1, not 0. This can catch you out if you're used to starting at zero, so be careful!

  • zeros() and ones() functions are a touch confusing. These functions create 2D arrays and are confusing because if you type zeros(15), you get a 15*15 2D array (ie. 225 elements in total).  So, zeros(15) is the same as zeros(15,15) and not zeros(15,1).

  • arrays are indexed using (); [] is for concatenation. No a huge problem as Matlab will just give an error if you try to write myArray[i,j], but if you're used to the [] convention then you'll probably make this mistake a few times before you get used to it.  

  • end-of-line semi-colon. Not all languages require you to explicitly write a end-of-line character.  Matlab will run a line of code that doesn' t end in a semi-colon; it will, however, print the result of that line to screen (which can be a pain if that result is a huge array)

  • different behaviour between vector, matrix inputs (eg. the sum() function). For a vector input, sum(myVector) returns the sum of all the elements of that vector.  For a matrix/2D array input, sum(myMatrix) will return a vector with each value being the sum along one row of the matrix.  To sum over the whole matrix, you need sum(sum(myMatrix))!



In conclusion

Matlab is a great language for getting things done.  It's quick to code in, provides lots of built-in functionality and for tasks like data mining and analysis gives you lots of help.  It will generally be slower to execute than languages like C and FORTRAN, plus is commercial (so you need to buy a licence) but if this isn't a problem for your project then Matlab may well be a good choice.

8 comments:

  1. Personally, I try to convince people NOT to use Matlab. There are opensource equivalents that are as (or even more) powerful like R or Scilab with which you can achieve exactly the same.

    These packages have plenty of support from the online community and are updated regularly. So, unless you absolutely need Matlab (because you can't be bothered to change your preexisting code) think twice before spending money on a licence.

    ReplyDelete
  2. If you're addicted to Matlab but don't want to spend the money, there's always GNU Octave (www.gnu.org/software/octave/), which has almost identical syntax.

    ReplyDelete
  3. Thanks for your comments! You both make some very valid points.

    I've not used GNU Octave, although I know some people who swear by it. I do use R quite a bit and find it very good. We'll be doing a "basics of...R" post at some point in the near future.

    Thanks again for your input!

    ReplyDelete
  4. I have used octave and liked it.
    It lacks some of the proprietary libraries from matlab (maple?), but I prefer to use an open source program if it is possible.

    ReplyDelete
  5. Though MATLAB has a reputation for slowness, I find it speedy enough for use with re-coding in other languages. It'd be interesting to see an examination of execution times for tasks typically performed in MATLAB, implemented in MAATLAB and other langauges.

    ReplyDelete
  6. we are taking a class called IP,
    not Internet Programing ,but Image Processing.
    and i got a nice new info about this language,
    since our study is based on it,
    thx a lot,

    Best Regards,
    Nazeeh.

    ReplyDelete
  7. Thanks for a great discussion of MATLAB, along with other popular programming languages for scientists. It's nice to hear where you find it useful, and which things you'd like to see improved.

    While for loops are slower in MATLAB than in C, we hope that you've noticed them getting faster over the years as we've continued to enhance the JIT Compiler and Accelerator that silently compile much of your code down to machine code.

    I'd also like to mention that MATLAB also offers implicit multithreading for many operations and functions, allowing your code to leverage multiple cores. We first introduced multithreading in R2007a, though it was not turned on by default until R2008a. We are multithreading a handful of additional functions each release, based on our studies of where we can have the best impact on code performance.

    Finally, I'm curious which incarnation of the OO programming you guys have used. I completely agree that the system we introduced back in the mid-90s never felt well integrated into the language. We introduced a completely revised system in R2008a that caught MATLAB up to other OO languages, with (hopefully!) a much more integrated feel.

    Thanks again, and happy blogging!

    ReplyDelete
  8. Hi Scott,

    Thanks for the input some very useful points! To respond in order:

    - I admit I've never explicitly speed-trialled the FOR loops (sorry!), but I'm very glad to hear that they're becoming faster. Do you publish bench-marking data?

    - Great news about the multi-threading! This is a feature (eg. in IDL) that I *really* like. I'm very happy to hear that more and more Matlab functions are using it.

    - we've used the old version of the Matlab OO. I've got a book about the new version sat on my desk as I type! :-)

    ReplyDelete