I will use this space to give biweekly updates on recent results and the work units planned for
Today I will begin by summarizing some of the main results of the last few weeks.
- More computing power can significantly improve results. This is illustrated by the 1ogw case.
For one of the work unit types (NO_SIM_ANNEAL_BARCODE_30) we ran 60,000 independent jobs, for a total of
600,000 structures. If we take the lowest energy ten structures, the median rmsd is 2.86. If we instead take
the lowest energy ten structures just from the first 18,000 jobs, the median rmsd is 4.49. So with more sampling,
we are able to land more explorers closer to the global minimum, and get more accurate results.
- Allowing additional flexibility in the chain can significantly improve results (this was the "breakthrough" I described
several months ago). In the "NO_VARY_OMEGA" runs, we went back to the pre breakthrough less flexible chain, and the results
were consistently worse. For example, in the 1ogw case, the median rmsd of the low energy structures increased to 4.50. For
1r69, the median rmsd of the low energy structures increased from 1.29 to 2.80.
- The computationally less expensive NO_SIM_ANNEAL methods were no worse in locating low energy low rmsd structures than
the SIM_ANNEAL runs. This is good news, as we can carry out many more of the NO_SIM_ANNEAL searches and so do more
searching for the same amount of CPU time.
- As Paul Buck anticipated, most of the remaining alternative methods we tested were roughly equivalent (except for the NO_VARY_OMEGA). One way of looking at this is that given the huge space we have to search, all that matters is how many independent explorers are sent out to search, not the details of the instructions each are given about where to search.
- Excitingly, for many of the proteins, the lowest energy structures are very close (less than 3.0A rmsd) from the true
structure. For example, in the NO_SIM_ANNEAL_BARCODE_30 the rmsds of the lowest energy structures are
These results are a significant improvement over anything that has been done before. If we are able to do this consistently for proteins in this size range, it will be a major scientific breakthrough.
Our next step will be to test out the computationally efficient NO_SIM_ANNEAL_BARCODE_30 method on 25 new proteins we haven't done calculations on yet. You will see the new proteins on your screen saver by early next week. The "BARCODE_30" means that for every 30 amino acid residue segment in the protein, a random choice as to the value of the angles for one residue are randomly picked at the beginning of the run. This directs different runs to explore different regions of the space, and is more or less equivalent to directing different explorers to different lattitudes and longitudes.
You will also see more "PRODUCTION_AB_INITIO" runs in the next few weeks. In these runs we are testing the first low resolution part of the search. We will lower the number of trajectories per work unit to avoid the max_cpu_time problem. I think we have largely solved this problem now by going to shorter work units and doubling the max_cpu_time limit.
There will also be tests of calculations for some of the other projects described in the introduction section of the web site. We hope to get the vaccine design calculations running on BOINC in the near future. With regard to the message board posts, we aren't yet doing any work on diabetes or MS specifically, but if we can generate accurate structures of proteins involved in these diseases using the methods you are helping us to develop, it will contribute to efforts to develop therapies.
Thank you again for all of your wonderful contributions!