Tuesday 16 December 2008

The basics of ... C\C++


The C Programming Language, second edition, by...
Image via Wikipedia


C and increasingly its Object Oriented younger brother, C++, are the go-to languages if you need complete control over your computer and the ability to transfer your code to just about any other platform. They are also the languages that are most likely to trip you up and generally make your life a living hell. This post is about what makes them so infuriating and yet very powerful.

A bit of history


C got its name because it is derived from another language called 'B', which is itself derived from another language called BCPL (which is the first language to ever use curly brackets to denote code blocks). It was written in the early 1970's by Dennis Ritchie as a systems language for use with the Unix operating system.
C++ started off life as an extension of C (C with classes) but has grown into one of the most widely used languages in the world. It is more suitable for writing large scale applications than C as it has more high level programming paradigms, mainly Object Orientation, that make it easier to write large programs.

Pedal to the metal


C and C++ are designed to grant the programmer access to the lowest levels of the computer and to do this they, especially C, map very closely to how computers work. C compilers are therefore quite simple to write, and hence are available on many different platforms, and produce fast, efficient machine code. Unfortunately, by being relatively simple they rely on the programmer to do most of the work. This is great if you want to wring the most performance out of your code but you generally end up writing a lot of code, making C\C++ less than ideal prototyping languages.

Pointing the way


One of the defining features of both C and C++ is they provide fast, efficient, direct access to the machine's memory. They do this through a pointer data type that stores the address of the target data rather than the data itself. The advantage to this is that the pointer itself is very small and so can be passed around easily. It is also just a number, that happens to point to memory, and as such can be easily manipulated to point somewhere else. This allows the programmer to create efficient data structures such as trees and lists because you can easily store a pointer to the next item and move around the structure by following them. The disadvantage is that since a pointer is just a number, it can just as easily point to some unexpected place in memory, say the operating system, and cause chaos. Many of bugs that are exploited by authors of viruses etc. are caused because a pointer is made to point to somewhere unexpected.
C\C++ support arrays as contiguous blocks of memory and so you just get given a pointer to the beginning and are left on your own. C and C++ have an array access operator ([]) but it is just a way of simplifying the memory access (this also why the first element in an array is 0 as it has a 0 offset from the pointer to the array). For all the reasons above you have all the same advantages and disadvantages as pointers.


For instance, this program fragment is perfectly valid and will compile without complaint

int array[] = {1,2,3};
for(int i = 0; i < 5; i++)
{
printf("%d\n", array[i]);
}

but the output will be something like this:


1
2
3
453809420

The last time through the loop the index is off the end of the array so the next piece of memory is treated as an integer and printed out, whatever it is!

We'll always have the memories


Because C and C++ are all about giving the programmer direct control over the hardware and giving them the most control possible, they both give the programmer complete control over the memory system. This means the programmer is completely responsible for allocating memory from the system and giving it back when they are finished with it. This is great as it means you have very fine control over how you use memory but does lead to one of the most dreaded bugs in C and C++: the memory leak. This occurs when you allocate some memory and then lose the pointer to it that the system gave you. When this happens the system can't give the memory to anybody else and it is lost until the program is terminated or the computer is rebooted. If the program is constantly allocating memory and then losing it then eventually the system will run out of memory and the program/computer will crash.

This program fragment demonstrates memory leaking:

void leaky()
{
void* memory;                //void is a special pointer type that can point to anything
memory = malloc(100);     //malloc allocates 100 bytes of memory returning it as a void* pointer
return;
}


When this function exits the local variable memory is destroyed but since it is only a pointer what it points to is not deleted and the link it that memory is lost, the memory has leaked. That memory can now only be reclaimed by the system when the program is shutdown or the computer is restarted.

Taking control


How do you tame all this seeming chaos? By being organised and thinking carefully about your programs structure. If you are using C++ then you can use the object orientation features of the language to protect the user (often yourself!) from harm by encapsulating lots of raw memory accessing and array manipulation behind nice interfaces. For instance you should never use a raw array (ie int myArray[]) instead use an object (ie std::vector) that will protect you from accessing memory outside the array, will add and remove items properly and a host of other subtle problems. (It is interesting to try writing a safe list/array object. It's a lot harder than you first think).
Luckily, because C\C++ are very mature languages they have lots of very well tested libraries to do a lot of basic, and not so basic, things for you (in fact both C and C++ have a standard library that is part of the language definition specification). These include input and output (I/O), collections (lists, arrays, trees etc.), algorithms (searching, sorting etc.) strings, memory management, complex numbers and many others.
C and C++ also have a lot of third party libraries, both free and commercial, that add functionality so you don't have to reinvent the wheel. Popular mathematic/scientific libraries include TNT (successor to LAPACK++), GSL, BLAS and IT++

Quirks of the language


1a) When allocating memory every malloc (C) or new (C++) must have a matching free (C) or delete (C++). Get used to thinking "I've just typed 'new' where am I going to write 'delete'?"
1b) Don't mix malloc and delete or new and free. Very bad things will happen (malloc and free are functions, new and delete are operators)
2a) If you are using C++ use the STL, or something similar like Boost. Don't reinvent the wheel, you'll do it wrong.
2b) STL compiler errors are some of the most unreadable in the world. Don't panic, use Google and learn how to decode them
3) Read Effective C++ and More Effective C++ by Scott Meyers. Wait 6-12 months, rinse and repeat.
4) C programmers hate the object oriented parts of C++ and will write code as if it doesn't exist. The problem this code will happily compile and you end up with a weird hybrid of the two that works perfectly but is confusing to both parties.
5) C++ FAQ lite is your friend. Do what it says and you'll be fine.
6) Doing 'clever' things with pointers might seem sexy but just say no.
7) Remember to add 'break' to each entry in a switch statement. Fall through is rarely what you intended and if you do intend to it, put in a comment to warn people.
8) Be careful how much you include in header files. On small projects you can throw everything into them but as the project grows the build times will get slower and slower. Practice good header file cleanliness: only include exactly what you need. And no more.


Summary


C and C++ are flexible, powerful, mature, portable and well supported languages. If you respect them then you can make flexible, powerful, portable software. If you don't respect them then you'll produce buggy, messy, unusable code. You have been warned.



Reblog this post [with Zemanta]

4 comments:

  1. [...] C++. The object-oriented version of C.  This makes is both fast and able to benefit from all the nice (from the developers’ perspective) qualities of an object-oriented language.  It needs the developer to handle things like memory allocation, which can be a headache, but the skilled programmer can use this to their advantage (See Basics of … CC++). [...]

    ReplyDelete
  2. erm, you loop over 5 values, but the printout has only 4 :) Additionally what happens is totally unspecified in C. Your program might crash or it might print bogus values. Or the sky will drop on our heads..

    ReplyDelete
  3. *smacks head*
    Thanks for spotting my 'deliberate' error.

    ReplyDelete