21st Century Scientist: Choosing a programming language

[caption id="attachment_40" align="alignleft" width="300" caption="Photo by public domain images"]

[/caption]

This is continuation from Part 1. Read that first!

In this post we're going to consider the question, "is a given programming language appropriate for the project?". You will need to judge whether a given language is appropriate for the problem you're trying to solve and, in particular, is it more appropriate than another language. The appropriateness of a language depends upon a wide range of factors such as: the quality and scope of libraries, the maturity of the development tools, easy of portability between different platforms (and the quality of the libraries on those platforms) and the basic strengths and weaknesses of the language itself, such as memory management and speed.

These are not easy questions to answer. Different programming languages can be compared on many different criteria and each one tends to excel in different areas. For instance, languages such as Perl and Python are very good at manipulating strings (ie words and sentences), whereas FORTRAN is much faster than either of them at doing numerical calculations. If your problem is one of manipulating text then Perl might be a good choice, for mathematical work FORTRAN would seem like a good candidate. However, this generalization is muddied by the quality and quantity of the libraries (see below) written for a language as they provide capabilities that aren't supported directly by the language ie. C/C++ don't handle text natively but excellent libraries exist to handle them.

It is worth noting that different stages of development can be done in different languages. For instance, if you are developing software that uses a lot of mathematics on large data-sets then it would make sense to first prototype in MatLab (or something similar), using a smaller test data-set. This means your prototype will be relatively quick to develop and the small size of test data-set means it should run fairly fast. You could then code the production version in C++ for memory management and performance reasons. This does mean you have to be proficient at two languages and need to remember that different languages have different features (eg. C++ allows for pointers, which Matlab does not), but as long as you're careful then this can be fine.

Memory management

One of the most important features of a language is how it handles memory. Many languages, for example Java and C#, provide automated memory handling (Garbage collection), which can save a lot of developer-time by deciding automatically when a piece of memory has been finished with, rather than the developer having to write code to do it. Other languages, such as C and C++, require the developer to handle memory allocation explicitly; it is very easy in these languages to lose track of memory, causing all sorts of hard-to-solve bugs which both slow down development and increase maintenance costs. At first glance Garbage collection may seem like the obvious choice as it removes several sources of error from a program. However, the Garbage collector must run from time to time to actually sort through the memory and decide what is no longer in use. This takes time and the developer may not be in full control of when this happens. Manual memory allocation provide the skilled developer with a high level of control over how the program uses memory and this may be required to actually solve a problem or to make the program run in a sensible amount of time.

Performance

Languages differ in their performance, independent of the choice of algorithm, with compiled languages (C\C++ etc.) being faster than interpreted languages (eg. PHP). This is because the interpreter has to run and this consumes CPU cycles, memory etc. Byte code languages (eg. Java) fall in between compiled and interpreted languages in that the code you write is 'compiled' into an intermediate form, the byte code, that is then executed by a virtual machine which translates the byte code into computer specific machine code. While it is generally true that the performance of byte code languages is in between that of compiled and interpreted languages it isn't always so. Modern virtual machines can perform 'Just In Time' (JIT) compiling that turns the byte code into machine code and which can bring the performance up to that of compiled code.

Performance is obviously a concern, but it is worth noting that unless you are working with enormous datasets or performing lots and lots and lots (this means billions or trillions) of calculations then the time-cost of development and maintenance will probably dominate. If your program can run in under a second even when written in a "slow" language, then it's probably fast enough! Another important point is that performance depends much more on the correct choice of algorithms than on wringing out the last drops of performance from a language, so do not sacrifice ease-of-development and maintenance on the altar of performance.

Libraries

The presence, or absence, of libraries should be a big part in whether to chose a language or not. Good, high quality libraries speed development because you don't have to reinvent the wheel. If your software needs to read or write XML files then a language with a good XML library will allow you to concentrate on the real scientific problem, rather than spending time writing, and debugging, code to read and write XML. A word of warning, just because a library exists doesn't mean that it is good. Always try out and/or research a library before relying on it; libraries can have bugs as well so you want to try to find reliable ones!

Tools

The quality of the tools supporting a language will also have a big impact on development time. More mature, widely supported, languages tend to have much better tools than newer/niche languages. Tools cover a wide range of support programs such as Integrated Development Environments (IDEs), debuggers, version control and deployment tools. A good IDE speeds development by providing a single program within which you write, test and debug your program. Being able to rapidly, and easily, move through the write, test, debug cycle makes the developer much more efficient as it helps keep them stay focused on the problem. The longer it take to make and test changes the bigger the risk the developer will lose focus and become distracted by, say, checking their email.

Conclusion

All of the above will depend on what the project is trying to achieve and in what context (how much time do you have? how fast are your computers? etc). By considering these issues and picking a language that's sensible for the project, you can make your life during that project a lot easier.

4 comments:

reader23 August 2008 at 16:26
Good post!
You've talked much about concepts, next would you give some concrete examples?
Choosing a programming language 1 | Programming for Scientists25 August 2008 at 07:57
[...] Choosing a programming language - Part 2 [...]
Ben25 August 2008 at 17:06
@reader. Thanks for the comment. We're putting together a post detailing the choices we both made on recent projects. Hopefully that will give you a better idea what we are talking about.
Edit: Examples of how we chose languages are posted here
List of programming languages used by programmer scientists | Programming for Scientists11 September 2008 at 18:26
[...] is a resource post to go with the previous posts on picking a programming language (Part 1 and Part 2). It’s a big list of programming languages that you might consider using for your software [...]

21st Century Scientist

Thursday, 21 August 2008

Choosing a programming language - II

Memory management

Performance

Libraries

Tools

Conclusion

4 comments:

Search This Blog

.

Recent Posts

About me

Topics