Computers Don't Like Ambiguity!

Interesting day today... I saw a couple of projects that had some relationship to Sire. I can't say too much on a public forum, but they both showed me that I am on the right track with what I am doing.

There are several problems with writing a molecular simulation code, and they all really come down to problems of ambiguity. Computers don't like ambiguity, but molecular simulation, and especially molecular mechanical simulation is highly ambiguous. Take for example the problem I am currently wrestling with; for the QM/MM forcefield I am using periodic boundaries. Normally with periodic boundaries you would only calculate the interactions with the closest replica. However, with QM/MM, you have to include *all* of the interactions between the QM atoms and the sparkles (the point charges that polarise the QM region). This means that I have to include *all* MM groups that are within the cutoff of the QM groups. This would be a problem if the QM region is extended, or covers multiple regions. This means that it is possible, in a periodic box, for different replicas of the same MM molecule to be interacting with the QM atoms (as the QM atoms may themselves be on opposite sides of the periodic box!). So what should we do? Should we include neither of the replicas, only one, or both? Is it right to include multiple copies of the same molecule? Here lies the ambiguity. It is not easy to just combine things together and hope that inheritance or cool code will solve these problems. Ambiguity will always creep into aggregate systems, and lots of thought is needed to work out to resolve that ambiguity. In this case, I think that both copies of the molecule do need to be included in the QM calculation (so two copies when adding the sparkles), but only one copy is needed when calculating the vdw interactions (as they should be dealt with correctly by the cutoff). Of course, I could always ban people from having extended or multiple QM regions! ;-)

Despite me trying to come up with a design that can cope with whatever I can think of, I always end up thinking of use-cases that won't work. And they are always important - e.g. now! Adding multiple replicas from the periodic boundary test is not something I had ever thought of. I am thus now putting some more thought into the volume class design to allow multiple replicas to be returned. Its a lot more complex a problem than it looks, as it will necessitate returning an unknown number of copies of a molecule that are translated for the different orientations (or even rotated - my boxes don't need to be cuboidal!). This could add extra complexity and cost to something, which in the normal case, should actually be very simple.