Using evolutionary conservation to help structure prediction

Message boards : Rosetta@home Science : Using evolutionary conservation to help structure prediction

To post messages, you must log in.

AuthorMessage
Profile Sarel

Send message
Joined: 11 May 06
Posts: 51
Credit: 81,712
RAC: 0
Message 30155 - Posted: 28 Oct 2006, 0:57:58 UTC

Hello,

My name is Sarel Fleishman, and I'm a new postdoc in the Baker group. My project deals with predicting the structure of large protein complexes. Chu has recently described the motivations and the approach for predicting structures of complexes, which you can follow at:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2395

By looking at the sequences of proteins that have been identified in various organisms, one can identify amino-acid residues that are evolutionarily more conserved than others. As a simple example, consider two proteins from human and from yeast that carry out the same biological process. If we find that a given amino acid position is the same in the yeast and human sequences, we would consider it to be evolutionarily conserved. The various genome projects are generating very large numbers of such homologous sequences across different species, which can then be used to derive statistically robust estimates of the evolutionary conservation of amino-acid positions.

This type of evolutionary conservation has been shown to correlate with whether an amino-acid site is buried in the protein core or exposed to water. The reason why buried amino-acid positions tend to be evolutionarily more conserved is that changing the identity of a position in the protein core is likely to disrupt its stability and render it dysfunctional. Conversely, changing an amino-acid position that is exposed to water is unlikely to harm the protein function. Therefore, evolutionarily conservation is potentially useful for predicting the structures of individual proteins or protein complexes. With the help of the BOINC users, I'm testing the hypothesis that amino-acid conservation could help Rosetta pick out conformations that are near the correct structure. The idea is to prefer conformations that place evolutionarily conserved amino acid residues in the core of the protein. The huge computing power of BOINC is essential for testing this hypothesis. BOINC produces a very large number of conformations, allowing us to pick up even slight improvements in the predicted structures. Hopefully, following these improvements, it will be possible to identify the optimal way of incorporating evolutionary conservation into Rosetta.

If you'd like to follow these runs, they typically have the suffix _ENVFILE to them.
ID: 30155 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 30162 - Posted: 28 Oct 2006, 3:29:38 UTC

Hi Sarel,

A good part of this went over my head, but what I understood was very interesting. Thanks for sharing the science with us. Glad to be helping the experiment with my idle CPU cycles.
ID: 30162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30169 - Posted: 28 Oct 2006, 8:26:40 UTC - in response to Message 30155.  
Last modified: 28 Oct 2006, 8:36:27 UTC

Hi Sarel,

yes, first let me join in the thanks for sharing the science.

May I also suggest you put the first paragraph of your post into a profile, and include in the profile a link back to this thread. Then when people see your name on these boards in future, they will be able to find out what you want us to know about you.

That is not my main reason for responding. You say

... I'm testing the hypothesis that amino-acid conservation could help Rosetta pick out conformations that are near the correct structure. The idea is to prefer conformations that place evolutionarily conserved amino acid residues in the core of the protein. The huge computing power of BOINC is essential for testing this hypothesis. ...



This reminds me of a worry I have, not just about your approach but about other approaches too.

Rosetta cannot check everything, so it goes for the "best bet" at various stages in the process in order to cut the search time. This seems to me a bit like the way a human chess master will not spend time looking at moves that lose him/her the Queen in exchange for a pawn; or will prioritise moves that give them more influence over the centre of the board even if they do not as yet have a clear idea of how they will use that influence.

This usually works but has a flaw. The grand master will spot that s/he can sacrifice a Queen to win the game a few moves later, and the mere master may have missed this outcome. This is how a "gambit" works - the eventual loser overlooks the downside of an apparently advantageous move.

Evolutionary conservation is another such heuristic. So I am being slightly unfair by putting this point to you alone: it applies equally to all the other heuristics being applied by your colleagues at Bakerlab. So I hope your colleagues will feel free to answer as well as / instead of yourself.

My worry is that these heuristics get built into future programs, and that most of the time they work fine, so people get to trust them. Then along comes an important divergence from the heuristic (in your case, an important divergence from evolutionary conservation that might be very exciting science) and the program will miss it.

It is worse than a Monte Carlo approach, which may merely be unlucky enough not to stumble across the right answer. What I am saying is that if someone uses a program based on this heuristic at a later date, and is applying the answers to matters of evolutionary conservation, then the program has been set up to ignore the very thing they are looking for.

There is an all too common pattern in human thinking that we see what we expect to see. Scientists work hard to exclude such patterns from their thinking, often unsuccessfully (eg Einstein worked for several years to take out of his theory the prediction that the universe is expanding -- he never for one monent thought it could be possible. He later described this as his greates mistake)

We are building this same fallibility into our thinking machines and the danger for the future is that we may forget the blind spots we built in. If it is hard to see my own blind spots, it is even worse to look for them in a machine that I trust.

Aside: Of course, there is another side to this coin. Why do we see what we expect to see? Because often enough (in our evolutionary past) it has worked out OK and saved time. In a predator/prey evolutionary race, the edge given by faster thinking may be more of a benefit than the disadvantage the few times our ancestors got it wrong. We, just like Rosetta, take short cuts in out thinking not because it is always right, but because that gives the best odds.

But best odds are not certainty. At present lay-people expect certainty out of a computer result in the way they don't out of a human expert. So it does worry me that we are building fallibility into programs that will be treated by many as "giving the objective answer".

River~~
ID: 30169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sarel

Send message
Joined: 11 May 06
Posts: 51
Credit: 81,712
RAC: 0
Message 30267 - Posted: 30 Oct 2006, 6:11:42 UTC

Thanks for your interest in this post! As I'm still new to the message boards, I hope that it has been clear, and would be delighted to elaborate if necessary. I've also followed River's suggestion, and added something to my profile, though it's still under construction.

River, your remarks are absolutely correct, and are some of our biggest concerns when working out the details of how to incorporate this new score term into Rosetta. Briefly (and I hope that I'm not misrepresenting), River's point is that adding a term based on what has been 'observed' to be correct for true protein structures could be dangerous, because it would bias future predictions towards our current prejudices. Significantly, those cases that counter our prejudices often turn out to be the most fascinating.

There is no clear cut way to eliminate the problem that River mentioned, but there are ways to minimize it. One way is to test any modification to the score with as many disparate cases as possible (i.e., proteins with different sequences and local structures), and this is indeed what we are doing with the crucial help of the BOINC platform. If many very different proteins seem to behave well with the new scheme, then there is a good chance that it is generally applicable. More realistically, I expect that we will find that some cases work well with this scheme, whereas others do not, and hopefully, we will be able to improve or at least to outline when this scheme is likely to fail.

Another element that can help in minimizing the problems of such bias is the way in which the score is designed. If we used evolutionary conservation to eliminate decoys that do not meet a predefined criterion, then we would probably have a big problem of missing structures that might excel in other aspects, such as structural stability. Our approach is therefore to use conservation in a very similar way to the other terms in the score function; that is as a contribution to the overall score, but not as a filter. This way, if a given decoy structure appears incorrect from the standpoint of evolutionary conservation, but nevertheless seems to have all other structural features in place, it is very likely to go on to the next stages of testing in Rosetta.

Ultimately, however, I think that these scoring schemes are an aid to human reasoning, but, due to these problems, should not replace human examination at the end of the prediction process. If a prediction appears to have been overly biased by any single criterion, this prediction should be revisited.
ID: 30267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 62
Message 30279 - Posted: 30 Oct 2006, 9:48:40 UTC

This reminds me of an exchange I had with David Baker last year, where I was suggesting that non-biologists look at the data. My point being along the same lines as that made, somewhat more eruditely, by River. It started in this thread but wandered about for weeks across several topics.

Without the "baggage" of foreknowledge, we may be able to see the wood in the trees.

Welcome to the team Sarel.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 30279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 30293 - Posted: 30 Oct 2006, 15:42:40 UTC
Last modified: 30 Oct 2006, 15:45:33 UTC

I'm not a biophysics expert by any means. So bare with me, but this seems as good a place to ask as any.

I note that it seems to be comparitively "easy" to determine the amino acid sequence of a protein with existing technology. Is there any way to create a set of 10 (or 100) short proteins that fairly readily bind rather indiscriminately to other proteins? And then determine if that binding has occured?

My thought is this, if you see there are 3 of your test set that bind to the protein who's structure you are attempting to predict... then it should help you to bias your predictions towards conformations that would cause those 3 to bind, and the other 7 not to. It is rather similar to the hydrophobic/hydrophilic attribute isn't it? Essentially helping you determine which AAs will be on the "outside" of the fold, and which hidden within.

Basically, I'm wondering if it might be fairly easy to find or design short strands, and use them to do further physical study in a lab prior to virtual study on a computer.

It would be like determining some docking experimentally, and then using that information to help find the protein structure.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 30293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sarel

Send message
Joined: 11 May 06
Posts: 51
Credit: 81,712
RAC: 0
Message 30354 - Posted: 31 Oct 2006, 5:43:31 UTC

There are some methods in molecular biology that allow one to do something very similar to what you're suggesting, and these methods are actually very useful. The first such method that comes to my mind is the phage-display library, where short segments of protein are amplified if they bind to a target protein. You can read more about this and related methods in the following page: http://www.answers.com/topic/phage-display

You are right that potentially data from such libraries could be used to help in structure prediction, but this is still quite far from today's computational capabilities. Such short fragments are likely to have very little intrinsic structure, so the problem of docking them on a given protein is in effect very close to the problem of ab-initio structure prediction, with all of the degrees of freedom. So, in order to bias the results of an ab-initio structure prediction using such fragments you would have to simultaneously fold your target protein and the fragments, and then try to dock them together.
ID: 30354 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Using evolutionary conservation to help structure prediction



©2024 University of Washington
https://www.bakerlab.org