AtomIndex is not an index - and AtomID is not an ID!
I can now officially say that Sire is cross-platform - yes, it now works on a Mac. The Unix foundation of OS X meant that nothing much was needed other than me fixing some problems in my cmake build files, and fixing a few bugs that OS X picked up (mainly missing SIRE_BEGIN_HEADER or SIRE_END_HEADER lines).
I still have this annoying habit of rewriting things, though normally it is the result of recursively refining my ideas about the code (e.g. the whole metaforcefield idea that I worked on in August - which was a really nice way of representing forcefields and lead to quite a clean up of that part of the code). Well, I am at it again, but this time finally sorting out the mess that was the missing chunks of SireMol. Obviously, SireMol, being the part of the code that includes the classes that model molecules, is an integral part of the code that has existed pretty much since the beginning. During the time I have had a Molecule class, I've evolved the design many times, and even introduced changes such as const classes, implicit sharing and changed the ID scheme. This left a lot of cruft, and some confusion (e.g. Atom and NewAtom classes, Bond working with AtomIndex, which isn't actually an index - AtomID is an index - while everything else worked with CGAtomID). And then I added names and numbers to CutGroups, added properties to molecules, realised that properties should represent everything in the molecule etc.
So now that Sire is between production runs (the water sims are finished and the paper nearly accepted, while the small solute and protein sims will be starting in early October) I am taking the opportunity to rewrite SireMol. First, I've created a Properties class that holds lots of Property objects. This will be mapped into the Molecule class so that *everything* in a Molecule is a property (including coordinates, bonding etc.). Next I am changing Molecule, Residue, Atom, CutGroup, Segment, Bond, Angle, Dihedral (and anything else you can think of) into views of a Molecule. This is the culmination of my experiments with PartialMolecule and AtomSelection, where I found that code was clean an easy to debug if I worked with PartialMolecules, as these could come from Residue, Molecule or Atom, and could encompass everything inbetween. And - it makes it much easier to move backwards and forwards - e.g. from Molecule to Residue to Atom and back again. This prevents a lot of the problems with separate Atom, Residue and Molecule classes, as each then has to have some sort of connection with the other, and there is always the risk that the user could delete a Residue, but keep the Atom (after all, what is an Atom without the Molecule it is part of?).
The other big change (and the one that I have been working on for the last couple of days) is - at last(!) - the sorting out of the ID scheme used in the code. I have finally rationalised the way that you identify things (e.g. atoms, molecules, forcefields) and have created a SireID module that forms the base of the system. At root, there are a few key classes;
- ID - this is the base of all ID classes
- Index - this is the base of all IDs that are an index into a list (e.g. 3rd atom in the molecule)
- Name - this is an ID that is based on a user-supplied name (e.g. atom name)
- Number - this is an ID based on a user- or program-supplied number (e.g. residue number, or program supplied molecule number - used to be called MoleculeID)
There is also the 'Identifier' class, which is a simple holder for ID, so that a generic ID object can be returned from a function (as a generic ID can be passed in by taking a const ID& argument).
This is then derived for each type of ID, e.g. to identify atoms;
- AtomID - base class of all atom identifiers
- AtomIdx - index atoms by position in array / molecule / residue etc.
- AtomName - identify an atom by name
- AtomNum - identify an atom by a user-supplied number (e.g. PDB atom number)
Then AtomIdentifier is a generic identifier that can be returned by a function.
AtomID can also be derived into combined Atom identifier classes, e.g. identify an atom in a Molecule by saying it is the third atom in the fifth residue (so a combination of a ResID with an AtomID). I have written a template AtomComboID class that inherits from AtomID and that can combine any other ID class with one of the AtomID derived classes above. This means that unlike before, when I had explicit hand-written CGAtomID, ResAtomID, CGNumAtomName etc. classes, now I just have a load of typedefs ;-) - this is so much easier to maintain and debug, and the best thing, is that using templates means that the AtomComboIDs are in most cases just a pair of 32bit integers (so just the size of a pointer on a 64bit system!) - and have explicit inline code that is fast and easy for the compiler to optimise. Of course now I've renamed the classes, so CGAtomID has become CGAtomIdx, while AtomIndex is now ResNumAtomName. While these look like horrible names (well, they are horrible names!) the user will just see them as tuples in python. Of course now I need to go through the code and rename all of the IDs...
(and finish rewriting MoleculeInfo and ResidueInfo, again so that they are both views of MoleculeInfoData, rather than being separate classes, and also rewriting EditMol as MoleculeEditor, so that it fits in with the rest of the molecule manipulator classes, e.g. mol.move().translate([1,2,3]) - I want to do mol.edit().add(atom).addBond(atom,("C",ResID(1))) - so yes there's tonnes to code - there always is)