Molecular Design: hydrogen positioning in water cluster

In my summer internship I need to design water clusters: structure made solely of water molecules. I am using FORTRAN and Gaussian to create hopefully highly or lowly symmetrical structures.

My program is almost successful! The (fast) last step is to place the hydrogens that are now in the midpoint of two oxygen atoms correctly,  so that

  • each hydrogen is closer to an oxygen (showing covalent bond) and further away from the other oxygen (showing H bonding).
  • each oxygen is close to two and only two hydrogens (because this is water!!)

And specifically…und zwar… I want to convert something like the half-finished cluster shown on the left to the almost-final cluster shown on the right. Haha, see the difference? I call the right-hand-side structure almost-final because the bond orders are still not right.

halffinished almostfinal

For each oxygen atom, we have the below information

1.To how many atoms is it (directly or H-bonded) connected? [In my programme this variable is called connatoms]

2.Are there any newly added H connected to this oxygen atom? , i.e. In my previous programme I first add a hydrogen at the midpoint of each O-O connection, and then I have designed an elaborate subprogram to add H that are not on the O-O connection, in order to satisfy the H2O formula i.e. twice as many H as O. Therefore, for each oxygen atoms, if its original connatoms is different from its connatoms after adding H, then we know that there must be newly added H attached to it. The difference between the old and new connatoms is the number of new H added.

Thus we must open a new variable that is just used to ‘remember’ the original array of connatoms.

3.An 1*4 array recording the distance of hydrogens to this oxygen atom. ‘Distance’ may be called ‘status’, which refers to one of the following:

  1.    N, means the H is covalently bonded to this O; Naturally, all H that are newly added are considered N to its nearest oxygen.
  2.    F, means the H is hydrogen bonded to this O; Note that for two neibouring (H-bonded) oxygens, if the status of one H is N, then the status of the same H for the other O must be F.
  3.    M, means the H is still in the midpoint of the two O atoms, so its attachment to the neighbouring oxygens are not yet decided. Then for both the neighbouring oxygen the status of this H is M.
  4.     K, just an array-filler when connatoms for this O is <=4. (Means Kein Wasserstoff…)Considering all oxygens, the array size is character,dimension(4,Onum)  :: status, where Onum is the number of oxygen atoms.
  5.  An integer-type variable, which I call Moglichkeit, that is a function of connatoms and the status array.

The function is not very complicated thanks to the fact that O is a first-row element and can be at most 4 coordinated! It is given explicitly below.

CONNATOMS STATUS MOG.
2 MMKK 1
NMKK 1
NNKK 0
3 MMMK 3
MMNK 2
MMFK 1
MNFK 1
MNNK 1
NNFK 0
4 MMMM 6
MMMN 3
MMMF 3
MMNN 1
MMNF 2
MMFF 1
MNNF 1
MNFF 1
NNFF 0

For the Moglichkeit function, the order of the letters is not important, and just the number of each letter is important. For example, the Mog of NNFF is identical to that of NFNF, or FFNN.

How the program runs:

Information pouring in: The BONNDDATA (showing connectivity of oxygens), the huge vector storing all coordinates of all atoms (in my programme called HUGECONN), the old and new CONNATOMS table.

Constructing the original STATUS array

First it is to note that the array is of type CHARACTER and has dimension (4,Onum), where Onum is the number of oxygen atoms.

For each oxygen:

We use the new connatoms table to determine how many Ks there are: # of K = 4-new connatoms. If there are Ks for an oxygen, we’d put them as the last elements of the 4 slots.

We use both the new and the old connatoms table to determine how many Ns are ‘by default’ given to the oxygen atom. These refer to the newly added H, so they are by default assumed to be covalently bonded to the oxygen. # of N = new connatoms – old connatoms. We put these default Ns immediately before all Ks.

Then we fill all other left slots with M.

Constructing the STATUS —-> Moglichkeit function

We note that the Moglichkeit array is of the form INTEGER with the dimension (Onum).

First we simply count the numbers of N, F, M and K for each oxygen atom. And then we basically assign the value of Mog using the table above.

What is the difference between Mog=1 and Mog=0? In both cases the oxygen has only one possible way to form its water molecule, but Mog=1 means not every hydrogen connected to it are placed properly, i.e. one or more are still in M position, while Mog=0 means all Hs connected to the oxygen are placed properly already.

Next we are going to move the hydrogens! This is of course the most exciting part of the whole program! If you think about it, you will realise that there are only two situations in which hydrogens should be moved: a) when you are sure where an H should go, and b) when you are not sure where an H will go. Haha!!! Sounds stupid, but I think this can be useful categorisation. We will tackle them one by one.

When there are oxygens with Mog=1

This corresponds to the situation where we know where a hydrogen should go. Using Mog=1 oxygens to place hydrogens should be the primary method, until there are no Mog=1 oxygens left. What we can do is now:

  1. Scan down the MOGLICHKEIT array and find the first O with Mog=1.
  2. Retrieve its status from the STATUS array.
  3. Determine the ‘correct’ status of the oxygen, that is, the only possible status with Mog=0, i.e. all the M slots changed to F or N accordingly. For example, MMKK to NNKK, MNNF to FNNF.
  4. Compare the Mog=1 and the Mog=0 status. There should be some changes, or we say, corrections. Decide how many, and which, slots are changed, and changed to what.
  5. For the first of the corrected slot, use BONDDATA array to see the corresponding connected oxygen. Use HUGEVEC to determine the coordinates of the two oxygens and the hydrogen atom at the midpoint of the two oxygens.
  6. If the ‘correct’ status of this slot of this oxygen is ‘N’, then change the coordinates of the originally mid-point hydrogen, so that it is 1.0 Angstrom from the parent oxygen. If the correct status is ‘F’, then place the hydrogen so that it is 1.0 Angstrom from the connected oxygen.
  7. Use STATUS array to find the status of the connected oxygen, and retrieve its bonddata. Find the label of the parent oxygen in its bonddata.
  8. Change the status of the connected oxygen, which depends on how the hydrogen has been moved.
  9. Use the function above to calculate the new Mog of the connected oxygen.
  10. If there are more than one corrected slot for the parent oxygen, then repeat 5-9 until Mog=0 for the parent oxygen is achieved.
  11. Scan down the MOGLICHKEIT array again and find the first O with Mog=1. Do this until no Mog=1 is found. Now if all Mog=0, then we have done our job. If no, we proceed to the situation described in the next section.

When there are leider no oxygen with Mog=1

  1. Find the oxygen with the lowest  Mog. (Hopefully, this is 2. But doesn’t matter if it is not.)
  2. Use the STATUS array to determine its status. In the status, find the first slot with M.
  3. We will arbitrarily move the hydrogen corresponding to this slot closer to the oxygen in question (the parent oxygen). Specifically, we do something similar to the previous section.
  4. We use the BONDDATA array to see to which oxygen it is connected. We choose the connected oxygen the corresponds to the slot in question, and use HUGECONN to find its coordinates, as well as the hydrogen at the midpoint of the two oxygens.
  5. We change the coordinates of this hydrogen so that it is 1 Angstrom to the parent oxygen.
  6. And we let the new status to be such that the first M becomes N, and from there we calculate the new Mog of the parent atom. Hopefully, it is 1 now. [Here it strikes me that we may need an array to keep the original status for each oxygen atom. I am at the point not sure if it is really necessary, but keeping some potentially important data is always a good idea…]
  7. For the connected oxygen, we also use BONDDATA to find the position of the parent atom in its connections. Now we retrieve its status, and change the corresponding slot from M to F. Calculate the new Mog for the connected oxygen as well.
  8. Scan down the MOGLICHKEIT array and find the lowest Mog now. If it equals to 1, go to Step 1 in the previous section. If it equals to 0, we are done, which is impossible. If it is still greater than 1, go to Step 1  in this section

Seems we have finally found a practical algorithm! 1500+ words are devoted. But what is present here is at most pseudo-code. Now I am going to proceed with the actual programming, and will update in case any amendments is necessary!

After this is done, I will try to figure out the last step: produce a final connectivity table, which connects all oxygen with N hydrogens with a 1.0 single bond, and all oxygen with F hydrogens with a 0.5 hydrogen bond. This is rather technical, but seems okay because no involved algorithm will be necessary, at least as it now appears to me.

Danke fuer Ihre Aufmerksamkeit!!! 🙂

0 comments