Sunday 24 August 2008

Source Control

[caption id="attachment_43" align="alignleft" width="300" caption="Photo by rpongsaj"]Photo by rpongsaj[/caption]

Almost all software projects comprise of multiple files, each of which will be edited multiple times. As the project progresses, files are added, changed and removed and if there is no way to track and manage these changes it is possible to get into a real mess. Source control (otherwise known as Revision Control, Version Control or Source Control Management (SCM)) is a way to help manage the changes to your files over time.


Not just a Filesystem


The source files you use in a software project will be subject to a lot of change as you write and improve them. Source Control is a process that helps manage of all these files as well as preserving the history of changes and allowing much more complex operations than can be done with a simple file system. Source Control software takes the form of a server, that stores all the files, and a client program that individual programmers use to get files, merge changes between files and other such functions.

One source to rule them all


Source control is essential as soon as you have more than one programmer working on a project. As two or more people work, they need to be sure they are working on the same code base otherwise changes made by one programmer will eventually clash with changes made by a different programmer. Without a process that can sort out these clashes quickly and as painlessly as possible the code will quickly become unmanageable. The version stored on the server is the common code and anybody working on it has to ensure that any changes they are making work with the common code before ‘checking it in’. Having this common base means that it is possible to co-ordinate hundreds or thousands of programmers, even in far-flung locations.

A single programmer will also benefit from source control as it provides an easy way to back up work (if the server is on different computer or entirely off site), preserve old approaches, back track if something doesn’t work or try things out using branches (see 'Branching Out' below), in addition to
making it easy to add collaborators.

Having your project under source control makes it far easier to have automated processes such as continuous building, a real help for any kind of team based software development, and testing which can grab the code from a central location without needing any human input.

Check out that code!


Different Source Control systems use different terminology and have different processes but the basic concept is the same for all. The process begins with the user ‘checking out’ a file, which means the server knows who is changing what files. In some systems this is a manual step that must be performed by the user, in other it is handled automatically. The user makes changes and then ‘checks in’ (or ‘submits’) the changed file. If the file has changed since it was checked out (because somebody else was working on it) then the two files will have to be ‘resolved’ and the combined version will be checked in.

The Source Control systems client program will include functionality that can show the difference between two files and allow you to choose what code will go into the final version. The program can often do this automatically because your changes may not conflict with changes from the server. This functionality is what saves so much time as it automates, as much as possible, what would be a very long and tedious process of comparing files by hand.

Since files are generally submitted in groups, the comments on the ‘check in’ are a ‘meta comment’ to comments in the individual files, which can explain architectural or other large-scale decisions.

Who's who of Source Control


In order to be shared, the repository needs to be on a machine that everybody can connect to. If everybody is in the same building this can be on the local network but otherwise it will have to be a globally accessible server.



Wikipedia has a comparison chart of major Source Control systems.  The list is quite long but the major open source systems are CVS , Subversion (otherwise known as SVN) and Git.  Also worth noting, mainly because they have been used by the authors, are Accurev and Perforce, both which are commercial products.  There are several web-based source control systems that have free services, if you have small amounts of data, which means you don’t have to go through the bother of setting up and maintaining your own server. Some services will only host open source code so check before signing up.

Examples include:

Who is it for?


Source control should be used by anybody who has files that change over time and/or if several people want to collaborate on those files/documents. It is mostly used by programmers but could be used by authors, editors and scientists who are collaborating on a document. Source Control can be used as a simple shared repository, the person who checks out the file first 'locks' it and prevents others from changing it until they check it back in. This way non-text files can be shared but the useful merging features are lost and all the participants must remember to check in their files as soon as they have finished working on them. The authors know of several science groups that use CVS to collaborate on documents before publication.

Branching out


Software that is being used is constantly changing as new features are added, bugs are fixed and code is refactored. While these changes are occurring the program must be kept working. Branches are a very useful way of being able to make use of the features of Source Control without affecting the code that is 'in production'.

Imagine starting work on a project: you create a few files, check them into your source control system, makes changes, check those in, add some more files, etc. You and your colleagues work on the program and finally you release version 1.0 to the world. Happy with your work you start on version 2.0. However, people find bugs in the 1.0 version. You want to help them but the code has already changed because of the 2.0 release. What do you do? You use a branch.

A branch is like taking a copy of some or all of the files at a certain point and putting them in a new repository. By creating a branch of all the files when you release 1.0, you can continue working on the original files to make the 2.0 release while still being able to make changes to the 1.0 branch thus allowing you to release a 1.1 or 1.2 versions.

Much in the same way as you can merge changes between files you can merge changes between branches (the ‘main’ branch is known as the ‘trunk’ or ‘main line’) so you can bring bug fixes from one branch into another without having to copy files.

Branches have lots of uses including trying out new ideas, working on a major change which will break lots of other code while it is in progress or allowing a single programmer to work on lots of different areas without having lots of code ‘checked out’ at the same time.

Conclusion


Source control is absolutely essential as soon as you have more than one person working on a project. Without it, it becomes almost impossible to coordinate work. And even a single programmer can still derive a lot of benefit from the branching, storage of past work and, if using an off site server, back-ups.  Source Control doesn't just save files it will also change how you code. As you start using Source Control you will notice that your code become cleaner as you are no longer tempted to leave old code lying around 'just in case'. Now you know it is on the server and if you ever need it (and you rarely do) it is just a few clicks away. This help keep the code easy to understand, easier to maintain and will result in a better project.

4 comments:

  1. I've also found version control useful for things other than code that you need to keep old versions of and track changes in. I used a subversion repository when I was writing my thesis and paranoid about losing versions.

    (p4s is a great idea for a blog - thanks guys!)

    ReplyDelete
  2. I can't work without source control now, I feel so vulnerable. :)
    If more non-programmers knew about it I think it would be much more widely used. I wonder if you could write a Word extension ... hmmm... must look into that.
    Glad you like the blog so far, please keep posting comments and telling us what you'd like to see more of (or less of!)

    ReplyDelete
  3. What clients do you guys recommend?

    I've been trying to use Subcommander for SVN but it drives me nuts sometimes.

    ReplyDelete
  4. I use TortoiseSVN to connect to a remote repository hosted at Assembla.com. I've had a few problems with Tortoise but I think that has been due to my hard drive and not their code.
    Having a remote repository means I get off-site backup as an added extra.

    ReplyDelete