21st Century Scientist: The basics of...IDL

[caption id="" align="alignleft" width="300" caption="Photo by Omakakii"]

[/caption]

IDL is the Interactive Data Language (not to be confused with another computing IDL, Interface Description Language ), which as the name suggests is very useful for the analysis and processing of data. It's very popular in particular areas of science, because it's quick to code in, has lots of support for data manipulation and has syntax that should be familiar to anyone brought up on FORTRAN (as many scientists have been). It's pretty similar in a lot of ways to Matlab and, to a slightly lesser extent, R . All three are optimised for getting code running quickly and all give a lot of built-in functionality. While all three cover much the same ground (and do so effectively), IDL is perhaps a little more oriented towards data manipulation, whereas Matlab is more about matrix algebra and R focuses on statistical modelling. But, having used all three, we can offer the considered opinion that they're all worth using.

Data monkeying
Sometimes, you just want to read in some data and have a play with it. If so, IDL is your man...er, language. IDL gives you lots of options in three vital areas for exploring your data. Firstly, it provides a wide range of functions with which to read in different types of data. Image data? It can handle most of the standard formats. FITS file (a format used by astronomers, which includes some fairly hairy sky co-ordinate system stuff) are catered for. Excel file? IDL can read in comma-separated variable (CSV) files. And so on. This is very convenient, as it means you can just read in your data without having to ever think about the specifics of the file format.

The second area in which IDL is well-supported is implementations of standard data processing methods. Curve-fitting, histogram generation, Fourier transforms, image processing algorithms and the like are provided for, which means that you can do a whole range of 'standard' manipulations to your data without having to pause to code them up.

The third area is data visualisation. IDL is great for this, as it provides a lot of ways to generate pretty (and informative) pictures from your data. And even if the exact options you want aren't available, there's a lot of support built-in for creating your own plotting code.

As an illustration of how useful this all is, we've seen (indeed, helped build) whole data reduction pipelines in IDL and the above functionality is a great time-saver.

Avoid FOR loops
Much like Matlab and R , the FOR loops in IDL are slow. This is because they don't benefit from the heavily optimised compilation like languages such as C, C++ and FORTRAN. Happily, it is often possible to avoid this problem through the use of vectorised functions, which are built-in to IDL and perform a given operation on a whole array of inputs. These functions contain the FOR loop, but it will be in compiled (C) code and will therefore be much. You can lose some flexibility as a result, for example if you want to access sub-arrays of data that live in different objects (so that you still have to loop over the objects), but vectorised functions are often a good solution to the inherent drawback of slow FOR loops.

Multi-threading
We really love this aspect of IDL (and wish that languages like Matlab and R would adopt it). For certain in-built functions, IDL provides multi-threaded versions of the code. This means that they can take advantage of multiple CPU cores on the same machine. Stop and think about this for a minute. This means that if you have a multi-core machine (and nowadays, who doesn't?), IDL will automatically parallelise sections of your code. Without you doing anything. How cool is that?

Edit: Matlab does now contain some multi-threading functions. Woot!

Use the libraries
One of the great strengths of IDL is that is has a lot of functionality built into it, in the form of libraries and toolboxes. The core libraries are effectively just a large set of functions that you automatically have access to when writing IDL code. Matrix manipulations, Fourier transforms, sorting algorithms, plotting routines and any number of other operations are there to be used. It's well worth browsing through the manual, just to give yourself a bit more flavour of what's available. I

Great for prototyping
IDL is generally a quick language in which to program, because the syntax is forgiving and there are lots of built-in functions. This makes it great for prototyping. While this can present an additional level of project-complication if you then have to re-code into another language for the final version, there's a lot to be said for being able to prototype rapidly. This is especially true if the science goals of your project include developing new algorithms and/or methods (see this example).

Coyote's guide to IDL
A well-supported language is nice because it means there's help out there if you need it. We can't talk about IDL without giving a special mention to Coyote's guide to IDL programming . "Coyote" is the alter ego of IDL consultant and trainer David Fanning and Coyote's guide is an awesome extra resource for the IDL programmer who wants practical advice beyond what they can find in the manual. We recall what a life-saver (metaphorically) some of the creative uses were for data compression via the 2D histogram function.

Support for Object Orientation
As you might hope, IDL has a version of object orientation built into it. Works pretty well.

Quirks of the language
Any language has quirks and IDL is no different. None of these are bad per se, and they're certainly not problems. Rather, they're things of which the programmer need to be aware. Here's a quick list.

array indexing starts at 0. Not a quirk per se, but different to both Matlab and R (which start at one), so worth mentioning.

arrays can be indexed using () or []. You can choose either, although we recommend using the square brackets, as the round brackets are also used in function calls

In conclusion...
If you want a language that's quick to code in and is good for data manipulation and visualisation, IDL is a good choice. It's a commercial language, so you'll have to buy a licence, but you do get good functionality for your money.

3 comments:

Fabrice23 February 2009 at 20:09
Interesting post!! And how do you compare IDL with Mathematica?
Heather20 March 2009 at 12:22
On multi-threading in R ... it's not in-built, but there are various options for implicit parallelization. See "State-of-the-art in Parallel Computing with R"
http://epub.ub.uni-muenchen.de/8991/1/parallelR_techRep.pdf
for some useful pointers, in addition the recently released multicore package on CRAN.
Rich20 March 2009 at 12:37
Thanks, Heather - very useful!

(I've also cross-posted the link onto our article on the basics of R: http://www.programming4scientists.com/2008/12/the-basics-ofr/ )

21st Century Scientist

Monday, 23 February 2009

The basics of...IDL

3 comments:

Search This Blog

.

Recent Posts

About me

Topics