Stories from CASP8

Message boards : Rosetta@home Science : Stories from CASP8

To post messages, you must log in.

James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 55710 - Posted: 12 Sep 2008, 5:18:00 UTC
Last modified: 13 Sep 2008, 6:54:45 UTC

The purpose of this thread is to talk what we tried during CASP8 and what we learned. I'll try to update this once a week with successful predictions, stories, and lesson that we learned from CASP8. I'll start this week out with an overview of one of the lessons that we learned.

One of the most successful approaches we tried is based on exploiting knowledge of known protein structures in modeling. There are on the order of 4 million protein sequences, and we know the structure of about about 50,000 proteins in Protein Data Bank. Some of the sequences that we looked at during CASP8 were very similar to proteins of known structure, and these similar structures that we know can be very useful as starting points for refinement. Utilizing information from known structures to build a model of a protein with unknown structure is called comparative modeling, and it's a very interesting problem in the field.

Included in the minirosetta v1.34 application are some updates to our protocols for comparative modeling. These all try to steal features of one (or more!) known structures to predict the structure of a protein with unknown structure. These include:

- assembling a protein from an extended chain in the presence of constraints that exist between atoms, and also other geometric features of the protein.
- starting with a conformation derived from a protein of known structure and attempting to remodel it, both with and without constraints.

I'll try and make sure that workunits for these tasks have descriptive names so that everyone can better understand how your computers are helping us. Thanks for crunching!

Also, what do people want to see? Do you like seeing superpositions of successful predictions? Do you want to hear more about the open problems in structure prediction and what we're pursuing? We also have some other anecdotes and stories from CASP8 if people want to hear that as well.
ID: 55710 · Rating: 0 · rate: Rate + / Rate - Report as offensive
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 55731 - Posted: 13 Sep 2008, 7:18:30 UTC
Last modified: 13 Sep 2008, 7:25:15 UTC

Here are a some quick responses to questions people have asked me about CASP8:

Is the comparative modeling what "rebuild" is doing in the game? If the structure is unknown, how do you come up with the constraints? And how do you know (or at least gain some confidence that) your chosen constraints are not omitting the native structure?

Comparative modeling is an approach to protein structure prediction that utilizes information based on structures likely to be similar to other known structures in the Protein Data Bank. The rebuild tool as part of Fold.It is part of our comparative modeling protocol, but it's not the entire protocol, and it's not the only approach that people have! Other things that we try are small random perturbations to the structure followed by a gradient-based energy minimization, which is the same minimization done by the "wiggle" tool in Fold.It.

In comparative modeling, we only need to know one structure of a similar sequence to our sequence in order to derive constraints, so we don't need to know the answer beforehand. However, if there are no available similar structures (similar structures are called [b]templates[b]), then we use our abinitio approaches to structure prediction, which are unconstrained and rely only on fragment assembly and Rosetta's energy function.

Your last question is a very piercing one - a comparative model built from known structures always has some features right and some features wrong. Knowing which things to move and how to move them is the most important question in refining comparative models! My project in the lab right now is use statistics on the template structures and their relationship to the query to pick which constraints to turn on and off. The abstract idea is that based on the shape of the template structures and their relationship to the query, we believe some things more strongly than others. For example, even two proteins with very similar sequences will have minor differences in their structures. These differences are more likely to be in the periphery of the protein than in the core, as we know from experience that the core is usually conserved among similar proteins. Another example is that some pieces of the sequence you're trying to predict don't even line up with anything in the template structure - this means that those piece and pieces close by them are going to be less conserved! Formalizing these intuitively obvious statements about the relationships between proteins is my current project, and it's a very interesting one. At the very least, it's interesting to me. :)

David has suggested that several other members of our group post in this thread as well to talk about what we did, how we did it, and tell you some hopefully amusing anecdotes about our work during CASP8. Expect those as well in the coming weeks, along with more discussions of protocols, results, and future goals for the project.

The CASP 8 predictions from all participating servers are available at If you're using Windows, the *.tar.gz files there can be unzipped and untarred (i.e., accessed) by using a program like 7-Zip; Mac and Linux users have built in utilities to deal with .tar.gz files.
At that page, there are four servers clearly belonging to the Baker group: BAKER-DP_HYBRID, BAKER-GINZU, BAKER-ROBETTA and BAKER-ROSETTADOM. Which of these used Rosetta@home resources? It would also be helpful to know those servers' ID numbers, which I imagine are needed to figure out who made what predictions in the files I linked to.

The only automated server that submitted 3D models for CASP8 was the Baker-Robetta server, and the models are named BAKER-ROBETTA_TS1.pdb through BAKER-ROBETTA_TS5.pdb, so you shouldn't need the ID numbers. You can view them in a program like Rasmol or Pymol, which I think people can find through Google. You can also find some of the native structures on this page:

An example is target T0473, where the native structure is PDB ID 2k53.

Robetta used Rosetta@Home for its abinitio predictions - these are the ones for which there is no obvious comparative modeling template available. The comparative modeling stuff was done in-house, because our comparative modeling approach for Robetta has lots of steps and doesn't easily translate to something as parallel and powerful as Rosetta@Home. Our comparative modeling benchmark is directed towards figuring out which of our many approaches to comparative modeling will go onto Robetta in the next few months. So pay attention! That model your computer is making could be important to the future of the project. :)

As always, thanks for crunching. CASP8 was a great time for all of us, and everyone in the Baker Lab really appreciates your contributions.
ID: 55731 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 55932 - Posted: 21 Sep 2008, 21:15:55 UTC

Thanks James for this post-CAP8 resume.

Indeed, over the next 4 months we will be testing some brand new code to address the problem of homology modeling. We've been constructing a pretty large benchmark of about 20 prediction problems from the past CASP5, CASP6 and CASP7 experiments. As soon as the initial testing is done we will extend this to about 60-80 targets.

Using this massive benchmark we hope to unify our various algorithms for this sort of prediction problem into one pipeline with minimal (preferably no) human intervention. Since there are so many parameters that go into this program, such a variety of algorithms, protein types and problem difficulties we will need a considerable amount of processing time to figure out what works the best on what sort of problem and how we can improve all the parts that contribute to this rather complicated algorithm.

The work units for this project will start with hombench_ followed by the run name, the method name and a target number (see casp website).

At some point in the future I'll post a more detailed description of the process itself, so stay tuned.

ID: 55932 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Rosetta@home Science : Stories from CASP8

©2021 University of Washington