Energy function benchmarking

Author	Message
Patrick Conway Send message Joined: 10 Nov 10 Posts: 1 Credit: 76,025 RAC: 0	Message 75637 - Posted: 20 May 2013, 17:39:30 UTC Hi guys, This is my first time posting here. My name is Patrick Conway and I'm a 3rd year graduate student in the Baker Lab. I'm working on improving the Rosetta energy function. You might be asking "What is the energy function?" It's a very central aspect of Rosetta (and I'm not just saying that to justify my thesis). It is a set of mathematical equations that computes an "energy" for a given protein conformation. As a protein conformation changes, the energy can either go up or down. The reason that proteins nearly always take only one conformation is because that conformation is drastically lower in energy than any other possible conformation. Our gold standard for benchmarking energy function performance is to compare native structures obtained from experimental xray crystallography and compare it to alternate conformations (decoys). The native should win out, but that doesn't always happen, which means I still have a job. As it is, our current default has been around for many years. In the past few years, there has been a lot of promising work on various aspects of the energy function. Now the developer community is undertaking an effort to consolidate these improvements into a new default energy function. In order to make a meaningful comparison between natives and decoys for an energy function, we have to do a full scale search for the best decoys that Rosetta can find. This is a very expensive operation and the Rosetta@home horsepower is an extremely valuable resource for this project. At this point, we have 420,000 competitive decoys from 84 different proteins collected from an R@H run last year. For every change in the energy function, we do local optimization on these decoys in order to "update them," then we use a decoy discrimination measure that quantifies how well the native structure stands out from the decoys. We've had tremendous success in several areas: using flexible covalent bonds instead of rigid covalent bonds, implementing a new solvation model (FACTS), optimizing hydrogen bonding and electrostatics, developing more detailed assessments of aromatic ring orbital interactions, and using an improved sidechain conformation library. However, we've been training our energy function on the entire data set and we need to do independent verification of our results using a new data set. We're hoping that it's not "too good to be true!" Some of you may have noticed "idealdead_test" (DEAD=Decoy set to End All Decoy sets) jobs running lately. These jobs are generating decoys for a set of 130 proteins. These proteins are relatively small monomers without ligands and with high-resolution experimental data (for our gold standard). I am generating abinitio predictions of these protein sequences with and without native fragment information. This allows me to cover a wide range of structures, from near native to far native. Once again, the key idea is to challenge the energy function to pick out the true native by generating competitive decoys. If there is interest, I am willing to delve more into the process of improving the energy function and the changes that we are making. ID: 75637 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 75639 - Posted: 20 May 2013, 18:20:25 UTC What constitutes "near" & "far" native?? Would I be correct to presume that if the energy function is improved, that it would reduce the average number of decoys needed to have a high degree of confidence that the native structure has been located? To what degree might one expect to reduce the required number of decoys? Rosetta Moderator: Mod.Sense ID: 75639 · Rating: 0 · rate: / Reply Quote