Tuesday 17 November 2009

"Should I switch to Python?"



[caption id="" align="alignleft" width="140" caption="Logo owned by the Python Software Foundation"]Logo owned by the Python Software Foundation[/caption]

Rich has recently been considering switching to the Python programming language.  Currently, Matlab is the language of choice in his department for rapid development and prototyping of code.  It's very good at this, but Mathworks (the company who produces Matlab) have been tinkering with the licencing terms, leading to hassles where none should exist.  This is very frustrating and leads to the thought that it might be nice to use a free language where this will no longer be an issue.

But of course things are not quite that straightforward.  Matlab is used for good reason - it's very good at what it does.  So is it worth the effort to stop using Matlab and instead learn to use Python?  In this article we discuss some of the things that'll need to be considered.

Why Python?
The first question is why out of all the programming languages that exist should we be considering Python?  The bulk of the reasoning is actually contained in the specifics of the sections below, but the starting point is that Python has a good reputation for being nice to work with, it's already used in some areas of science (suggesting it might be a sensible language to consider), and it has a wider community of users (including some big ones such as Google), so there should be good community support.  So, this looks superficially promising.  What about the specifics?

It's free...
First up, Python is free.  So no licence problems and no need to find the money to pay for it.  This does mean that there isn't a company whose raison d'etre is to build new functionality for Python, but there is an active community helping to develop it, so that's probably not too much of a problem.


What do I need it for?
This is a key question when deciding whether to learn a new language.  If you're anything like us, you're attracted to languages because you can do cool things with them, but you should be careful that they are the right cool things for your needs.  In this case, Rich needs a language for building prototype implementations of statistical modelling tools.  So, it needs to be fast to code in, object orientation would be desirable and lots of scientific library support is vital.  Flat-out processing speed is a nice bonus, but is less essential as Rich is happy to recode in C++ if he needs to. (or use a bigger computer)


Library support
For scientific programming, having the right libraries is vital.  We need to generate plots, process data, invert matrices, perform Fast Fourier Transforms and all sorts of specialist things like that.  All of these things can be found in libraries for various programming languages, so it's sensible to make sure you have access to these.  Python scores well on this count because of packages such as SciPy, BioPython, NumPy and matplotlib.


Usability
This is always tricky to assess without using the language, but the perceived wisdom on the Web, backed up by the opinions of some of our colleagues, is that Python is extremely user-friendly.  Indeed, this is part of the stated design philosophy of Python (see here).


Speed
For prototyping scientific code, computational speed is a bonus rather than a necessity.  At this stage, user time (for programming) is far more valuable than CPU time, so an interpreted language like Python is acceptable.  Comparative benchmarking between languages is notoriously hard (and task specific), but the impression we've got is that Python and Matlab are probably of order the same speed, and a couple of orders of magnitude slower that fully compiled languages like C++.  However, in both cases people are working to make Matlab/Python implementations that are faster.  And we probably won't be losing out significantly by switching from Matlab to Python.


What does everyone else use?
It's very useful if you're surrounded by experts in the language you're using.  It's also useful if your colleagues know the same languages as you, because they can pick up and use the things you write.  In the case of Rich's department, many people use Matlab but almost no-one uses Python.  This is a downside.  Of course, someone has to be first whenever a change like this is made, but it would mean that Rich would be on his own to a certain degree.


A tranferable skill...
It's always prudent to be developing transferable skills and experience with Python would certainly count as that, because it's widely used in industry and the commercial world.  Matlab is also widely used, although perhaps more in science/engineering settings and less in places like the computing industry.  It's probably true to say that both have their merits in this regard.


What about Octave?
Wouldn't it be nice if there was just a free version of Matlab?  Well, there is (sort of):  GNU Octave.  This would be another good solution to Rich's Matlab issues.  We're discounting it here mainly because of the concern that it's less well supported than Python, and also because it's less of a transferable skill.  Neither of these reasons are killers, however, so we wouldn't try to dissuade anyone from going down the Octave route.


In conclusion
Rich has no plans to rush this decision, but it's clear that there are a lot of benefits to be derived from changing to Python.  Because you may end up using any new language for years to come, it's sensible to take time when deciding whether or not to change.  And read some introductory materials such as
this beginners' guide to Python and one of our earlier articles, The basics of Python.

16 comments:

  1. Python might be sensible. The libraries are there, he will see an immediate speedup in non-linear algebra code as soon as he ditches MATLAB's appalling line-at-a-time interpreter, and large code bases are much easier to maintain in Python. That being said, I have to take issue with a couple of things.

    > prototype implementations of statistical modelling tools... object orientation would be desirable

    This doesn't follow at all. Object oriented programming as understood in Python and other Simula derived languages is a terrible way to organize mathematical code. Common Lisp's generic functions permit a fairly nice organization, but it is a dramatically different approach to object orientation.

    And you perpetuate the idea that a language with a live REPL, which doesn't require long compile cycles, must be interpreted and slow. I can only assume you are unaware of the existence of the many mature Common Lisp compilers which regularly perform on a par with C and outperform C++, OCaml which is similarly fast, the many good Scheme systems which compile to machine code a factor much less than ten slower than C++, and Haskell's ghc which produces code in the same range, and with the right approach sometimes dramatically faster than C. Lush, an old Lisp system with vast numerical libraries, has similar performance. Good Smalltalk implementations can be similarly fast.

    ReplyDelete
  2. I dunno, Python has a lot of really annoying warts and it doesn't really buy you much. Maybe try: Haskell, Scala, OCaml, etc.

    ReplyDelete
  3. "it’s clear that there are a lot of benefits to be derived from changing to Python"

    To be honest I don't think it's that clear from this post. You are mainly giving reasons for why Python might be a decent sci language and why it wouldn't be the worst thing to switch to it from using Matlab. The only benefit in your post that Python has over Matlab is that it's free. This may be a big benefit, but it is still just one (not "a lot"). (Maybe Python is a little more transferable as well, but that's still just 2)
    I don't mean to get caught up in semantics here. It's just that I would be interested in hearing about more of these benefits if they exist.

    If Rich does switch, it would be great if the process and comparison between the languages/environments could be documented here.

    ReplyDelete
  4. Hi guys,

    Thanks for all your feedback! This is something I'm genuinely thinking about at the moment, so I'm glad that people have valuable insights to offer.

    Jordi - yes, perhaps I overstated slightly the number of benefits :-) One additional benefit I should also have mentioned is that I find the syntax of Matlab a little bit counter-intuitive sometimes (for example, the function zeros(4) would generate a 4-by-4 element array, not a 1D array with 4 elements).

    I have in mind that I'll write at least one update post on this subject at a later date, so watch this space!

    Frederick - thanks for your comments. I don't have any great knowledge of Lisp, so it's great to have some feedback on that!

    I disagree that OO is a "terrible way to organise mathematical code". My experience (which is mainly with statistical modelling) is that the features of OO are pretty useful, for the usual reasons. I'd be happy to accept that it may not be well suited for all areas of mathematics, and I'd certainly be happy to acknowledge that there are other approaches (such as functional programming) that have a lot of merit. But I think it's wrong to imply that there's no merit to using OO in relation to mathematical code.

    ReplyDelete
  5. @Frederick:

    Matlab has had just-in-time compilation for quite some time (since v6.5), and it is even possible to partially precompile code to a bytecode format through the pcode function.

    Newer versions have both classes and modules, and should be comparable to Python when it comes to code base organization.

    IMHO, the real problem with Matlab is that there is some old legacy cruft which still enables people to "code FORTRAN" in it, and which Mathworks are reluctant to deprecate, in order to maintain backwards compatibility.

    A "cleansed" version would be preferrable, and would probably feel a lot like Scheme with matrices instead of lists.

    ReplyDelete
  6. I'm using python for a while for my work with bioinformatics and theoretical population genetics (PG). I can say that python is really good to code proof of principle ideas. But, for real use of the code there are unacceptable drawbacks.

    It's very easy to implement fundamental PG models in python. But, whatever optimizations and clever coding one uses it will be too slow for MCMC and other simulations. The can be said to certain dynamic programming techniques in bioinformatics. Good control of speed, concurrency and numerical precision is essential in those areas.

    On the other hand, a combination of python/C/FORTRAN can be a real killer. Python can be an amazing integration language. The most important numerical tasks are already coded and optimized in excelent quality libs in C/FORTRAN. Most of my actual work is to reformat data to fit into this libs.

    So, one don't need to switch to python. Only needs to know where to use it. By the way, there are some wrappers to use matlab from a python shell and should be easy to write one from scratch.

    Why put your resources to compete when they can just cooperate?

    ReplyDelete
  7. Objects are nice, but it requires time and a lot of refactoring to get the abstractions right. Sometimes I wish python programmers would not slice an algorithm apart and paste it into 5 methods of 3 classes.

    ReplyDelete
  8. Python is nice, but it is a different language than Matlab although numpy makes it just as (or even more) powerful.

    But why not take a look at Octave? It is very close to Matlab, much of the code runs entirely without modifications and it is easy to extend.

    It would be an easier first step than porting a lot of code and knowledge to numpy.

    ReplyDelete
  9. @Ronald: I stand corrected. Thank you. But it would not be a Scheme in any way, shape, or form. One function per file? No macros? A bare excuse for anonymous functions? No.

    @Rich: I will admit here that object oriented programming a la Python has never seemed like a good fit for anything I've done, but I learned Lisp before I wrote any object oriented code at all. This may be a peculiarity of my brain.

    ReplyDelete
  10. If you want to switch to python for doing calculus and matlab-like operations like the ones you describe, you should better look at:

    - enthought (http://www.enthought.com/products/epd.php), which is basically an enhanced version of numpy/scipy/matplotlib, from the developers of these libraries themselves. It is like a commercial (but academic free) version of Pylab.

    - sage (http://www.sagemath.org/), which allows you to write programs that call different syntaxes and scripts in matlab, R, and other programming language, and to do it with a python syntax.

    In general, I would not recommend you to switch to python, because it has -way- less libraries for what you want to do compared to Matlab. For the same reason, I wouldn't recommend anyone to switch from R to python, and consider that I am a python freakie and user myself :-(

    The Pylab libraries (numpy+scipy+matplotlib) have seen a lot of improvements lately, but don't expect too much from them, unless you are wishing to contribute to these libraries.

    ReplyDelete
  11. [...] Should I switch to Python – Are you a MATLAB user considering the switch? [...]

    ReplyDelete
  12. Just to pipe in here quickly, I have recently made the switch from Matlab to Python with Numpy/Scipy and MatplotLib), mainly for things like signal processing and related...

    Some of the most important advantages noone has pointed out yet are:
    Operating System Independent - Now I know Matlab has versions for Windows, *nix and OSX, but an institution will not always pay for licences for all three. At least with Python you can easily switch between systems at no additional cost. I guess this would fall under the category of free though, but the convenience that comes with the possibility to work on whatever operating system you happen to boot up your machine in is worth considering.

    Also, once you write your script it's a real program. Often people will use Matlab to prototype something up and then, once verified they will implement it again in another language. If you implement it in python to start with, it's already a real program which just needs to be altered. The same script can be used as a command line script, can be expanded with a GUI, or can be used as the back-end for a web application. I know Matlab has the GUIDE and a websever but they are Matlab specific. At least Python has bindings for GUI frameworks you may already use, e.g. wx, QT etc...

    I may be biased since I am a fan of open-source. I will say this though, I haven't looked back since I first took the plunge.

    YMMV though,

    Cheers,

    Jack

    ReplyDelete
  13. I recently made the switch from MATLAB to python for research. One additional benefit nobody has mentioned: Python is a great way to glue together codes from many different languages. I would argue that with python, you have _more_ numerical libraries at your disposal, since you can easily interface with great existing fortran or C libraries (most of the best codes are written first in these languages, and then ported to matlab).

    Additionally, if you eventually decide to speed up your code, you can write parts of your algorithm in a C++ module and call it from python. This gives you the speed of a C++ implementation as well as the interactivity and plotting functionality of python.

    Best
    --Peter

    ReplyDelete
  14. Hi there! Someone in my Myspace group shared this site with us so I came to check it out.

    I'm definitely loving the information. I'm book-marking and will be tweeting this to my followers!
    Fantastic blog and fantastic design and style.

    ReplyDelete