Profile, Profile, Profile
I've come to really like atomic integer operations and have begun using them a lot in Sire. Atomic integer operations are operations that occur on integers that are guaranteed to occur atomically, i.e. they are thread-safe, as no two operations on that integer may occur simultaneously. Atomic integers are thus very widely used in systems programming, and allow for reference counting in thread-safe smart pointer implementations. Atomic integer operations do however impose some overhead, as they require that access to the integer is serialised. Mutexes are the normal method of choice for serialising access to a resource, but they are very heavy-weight, and would be too slow for many of the cases where atomic integer operations are required. Instead, custom assembly code is used that is specific for each platform. Fortunately, there now exist several cross-platform libraries that abstract this platform-dependent code into an easy-to-use, cross-platform API. Of course, I am using the Qt q_atomic operations, but there are others out there as well. The Qt q_atomic operations form the basis of the QSharedData and QSharedDataPointer classes which I use extensively to provide copy-on-write implicit sharing of my copyable self-managed objects. I have now also begun using the q_atomic operations for ID numbers and version numbers, as they allow for unique version numbers to be assigned rapidly to different objects. This has allowed me to fix an obscure bug;
Molecule mol; // create a molecule (ID == 1, version == 1.0)
cout << mol.ID() << mol.version();
//prints ID == 1, version == 1.0
Molecule mol2 = mol; // create a copy of 'mol'
cout << mol.ID() << mol.version();
//prints ID == 1, version == 1.0
mol.translate( Vector(1,1,1) ); // moved, so version incremented to 1.1
//prints ID == 1, version == 1.1
mol2.translate( Vector(-1,-1,-1) ); //moved, so version incremented to 1.1
//prints ID == 1, version == 1.1
The problem above is that we now have two molecules with different coordinates, but that both have the same ID and version numbers. This is bad, as it means that the code cannot guarantee that an ID and version number *uniquely* identifies a particular molecule and state of that molecule. This is because the algorithm that incremented version numbers was very basic (just incremented an integer), and the version number could not be shared between molecules because of the expense of then requiring mutexes whenever the version was incremented. Fortunately, q_atomic operations ride to the rescue, and do allow for thread-safe sharing of version numbers between objects. This means that incrementing the version number can be *guaranteed* to return a new, unique version number.(*) Using q_atomic integer operations, in the above case, mol would have ID == 1, version == 1.1, while mol2 would have ID == 1 and version == 1.2. Two different states of the same molecule are therefore guaranteed to have unique version numbers.(**) So this is great! However, I then read something that put me on guard. This page reports the results of profiling different copy-on-write smart pointer implementations. This page therefore provides a good test revealing the real computational cost of atomic integer operations. Annoyingly their server is down at the moment, so I can't quote from it, but it showed that copy-on-write can be very slow for large numbers of operations (hundreds of milliseconds for millions of operations, if memory serves!). Worse still, this cost was still paid even for single-threaded programs for which atomic integer operations were not necessary. Obviously this seemed like bad news for me, given my reliance on copy-on-write and atomic integer operations. So, like any good programmer, I did some profiling of my own. I took my version number classes and timed how long it took to increment the minor version 100000 times, and then how long it took to increment the major version number 100000 times (the major number takes longer as it needs to create a new atomic integer for each new major version, so the time includes 100000 calls to 'new'). The results were very encouraging. On my laptop (1.4GHz Pentium M), the 100000 increments of the minor number took 3ms, while the 100000 increments of the major number took 85ms, while on my desktop (3.2GHz Pentium D), the times were 4ms and 40ms respectively. To make sure that the compiler wasn't optimising away the test, I repeated it using 1 million increments (ten times the number). The resulting times on my desktop were 37ms and 367ms, which, being ten times the time, shows that the compiler has not optimised away the test. This is great, as it means that I can now calculate that the time to perform a single q_atomic increment is approximately 40ns (or conversely, about 30 million increments can take place a second). This also gives me an estimate for how long a 'new' takes, for an integer sized object (and when it is the first 'new' after the deletion of an integer sized object...). Each 'new' takes approximately 400ns (so about 3 million can be performed per second) on my desktop. Both of these timings are insignificant compared to the time for a single energy evaluation, so I am happy with their cost. Since these costs are a fixed value, regardless of the size of the system, they do not impact on my intention for each move to take ~1 ms, so that I can reach a peak speed of 1000 moves per second (comparable to ProtoMS), but they may have some effect for very small systems, for which ProtoMS can do 10000 moves per second (each move thus taking 1000ns).
(*) Ok, guarantee is a bit of a strong word. Obviously the integers will eventually overflow and loop back to the beginning. This will occur around version 2 billion. It is conceivable that a single molecule will be subjected to 2 billion moves in a single simulation, but the chances of this causing a problem are low.... (he says.... Maybe one day I should move to 64bit atomic ints..?)
(**) Again, there does exist that slim chance that a user will simultaneously keep a copy of a molecule from the beginning of a simulation, and then from over 2 billion moves of the simulation, and thus these two will have the same ID and version numbers but will be different.