Folding of known structures

Message boards : Rosetta@home Science : Folding of known structures

To post messages, you must log in.

AuthorMessage
Mark

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 77582 - Posted: 17 Oct 2014, 13:40:49 UTC

Hi,

I was wondering why there is so much folding being done of structures with a known native solution? I could understand that it is useful sometimes as a metric to test the folding capabilities of Rosetta, and to develop new efficient folding movers to include in Rosetta, but it seems that 60-70% of the WU I get are folding to a known solution which seems awfully high. The only other thing I can think of is that you are trying to find the sequence for a computed binder structure. Is that it?
ID: 77582 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AtHomer
Avatar

Send message
Joined: 26 Jan 10
Posts: 13
Credit: 7,145,229
RAC: 0
Message 77588 - Posted: 18 Oct 2014, 19:05:55 UTC

I am curious about this as well.

By the way, how can you tell that the folding being done is of structures with a known solution?
ID: 77588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 77592 - Posted: 19 Oct 2014, 9:46:42 UTC - in response to Message 77588.  

I am curious about this as well.

By the way, how can you tell that the folding being done is of structures with a known solution?


When in the screensaver it has 4 boxes and the fourth one is called "native", the native one is the actual solution probably derived from xray crystallography. Also you have a RMSD figure/graph which is the "difference" between the native structure and the structure the computer has just folded
ID: 77592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77593 - Posted: 19 Oct 2014, 14:32:01 UTC

I am just a volunteer and not in the BakerLab, but in very general terms, if you were to develop an algorithm that is to predict structures, and you test it by creating a pile of possible models of things of unknown structure... then what have to learned about how to improve your algorithm? Running it against known structures is how you prove your algorithm is working well, or came up with the same correct answer, or an answer with even better precision, or using dramatically less compute power.

I guess perhaps the perspective you are missing is that the aim of the project is not to "solve unknown structures", but to "develop generalized computer algorithms that are able to accurately solve unknown structures". (I'm not quoting anyone there, I'm just trying to denote a possible title that summarizes things)
Rosetta Moderator: Mod.Sense
ID: 77593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 77595 - Posted: 19 Oct 2014, 21:32:04 UTC - in response to Message 77593.  

I am just a volunteer and not in the BakerLab, but in very general terms, if you were to develop an algorithm that is to predict structures, and you test it by creating a pile of possible models of things of unknown structure... then what have to learned about how to improve your algorithm? Running it against known structures is how you prove your algorithm is working well, or came up with the same correct answer, or an answer with even better precision, or using dramatically less compute power.

I guess perhaps the perspective you are missing is that the aim of the project is not to "solve unknown structures", but to "develop generalized computer algorithms that are able to accurately solve unknown structures". (I'm not quoting anyone there, I'm just trying to denote a possible title that summarizes things)


Yes I get that point, but as I said in the original post it seems about 70% of the time you are folding to a known solution. If you multiply that up over the number of participants that's an awful lot of testing data. You would then be able to make lots of changes to the algorithms as you have lots of data to go on. However the minirosetta program gets updated infrequently, which brings me back to the original question....
ID: 77595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 77596 - Posted: 19 Oct 2014, 22:19:37 UTC - in response to Message 77582.  
Last modified: 19 Oct 2014, 22:20:35 UTC

Hi,

I was wondering why there is so much folding being done of structures with a known native solution? I could understand that it is useful sometimes as a metric to test the folding capabilities of Rosetta, and to develop new efficient folding movers to include in Rosetta, but it seems that 60-70% of the WU I get are folding to a known solution which seems awfully high. The only other thing I can think of is that you are trying to find the sequence for a computed binder structure. Is that it?


Here is a description of the breakdown of Rosetta work units given in June 2013.

gregorio wrote:
This is Lucas from the Baker lab...

We are primarily engaged in a few tasks here, all of which we use boinc for:
1) Making better algorithms to predict structures. Mod.Sense pointed this out. Much of the use of boinc is to test out variants of algorithms, totally new ideas, etc.
2) Improving the scoring functions in those algorithms. This gets pretty technical, but you can think of Rosetta software as a search algorithm -- it needs to look around (sampling) and it needs to evaluate what it finds (scoring). Boinc is used to test new methods of scoring, aka new ways to evaluate structures. These methods help structure prediction (1, above) and sequence design (3) below.
3) Design of new proteins for new tasks. This is the inverse of problem (1) where we know the sequence and are predicting the structure. Here we have a structure, or multiple structure ideas, in mind and we want to design a protein that takes on that structure. The structure could be an influenza binder, or a new enzyme to treat a disease. We run Rosetta algorithms to design new sequences for a given structure, and often run that on boinc.
4) When we make a new design, how do we know that it will look the way we want it too? Well, we put it back in to step (1) on boinc, to test if it is at least self-consistent. If boinc doesn't give us back the structure we are trying to make, we might be in trouble.

The majority of folks here in the lab are working on (3) and some are doing (4), and many of us use boinc as a vital tool to make design and design evaluation possible.


Some of those tasks will obviously use native structures while others won't. What proportion of tasks fall into each category and how many use a native structure? Only the project scientists could answer but I suspect it fluctuates wildly over time.

For example, in the run up to the CASP experiments I would expect a high level of testing against native structures to check their latest algortihms are working. During the CASP experiments the structures are unknown so there would be fewer native structure tasks.

Added to that is the fact that scientists all around the world can submit tasks to the Robetta server and some of the Robetta work gets passed to BOINC. The proportion of native structures appearing in Robetta tasks will be completely outside the control of the Baker lab team.


I am guessing that your observsation of 60-70% of tasks using native structures is based solely on the selection of tasks you receive. Have you gathered those observations over time or is it just a rough estimate based on a small sample? With 5 million tasks in the queue at any one time there could (hypothetically) be just 1 million tasks with a native structure (20%) but you are "lucky" enough to download a higher proportion of those, which makes it seem to you like 60-70%.
ID: 77596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77597 - Posted: 22 Oct 2014, 14:06:23 UTC

A good portion of what defines how a work unit runs is defined in parameters to the base program. So, even without an application update, numerous variations can be configured in to groups of WUs for study.
Rosetta Moderator: Mod.Sense
ID: 77597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christopher Bahl
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 7 Feb 13
Posts: 9
Credit: 801,638
RAC: 0
Message 77624 - Posted: 31 Oct 2014, 19:50:40 UTC
Last modified: 31 Oct 2014, 19:56:51 UTC

Hi Mark et al.

I think I understand the confusion here- we aren't commonly using Rosetta@home to predict the structure for proteins where we already know the structure (the closest we come to this is CASP).

However, we are heavily utilizing Rosetta@home to validate and test de novo designed proteins. In these cases, we've designed a totally new structure and protein sequence that has never existed before in nature. Then, we take the amino acid sequence which codes for our de novo protein and ask Rosetta@home how this sequence will fold. Finally, we compare each Rosetta@home model to our designed model and ask how well they match up, which I think is what you're seeing as a "known native solution." In reality, this isn't a "known" structure, rather it's one we're attempting to make. Evaluation with Rosetta@home is currently the most important verification criterion we use to assess the quality of our de novo designed proteins prior to testing in the lab.

As always, many thanks for donating your CPU hours; you make de novo protein design possible!

Cheers,
-Chris
ID: 77624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark

Send message
Joined: 10 Nov 13
Posts: 40
Credit: 397,847
RAC: 0
Message 77626 - Posted: 2 Nov 2014, 17:43:20 UTC - in response to Message 77624.  

Hi Mark et al.

I think I understand the confusion here- we aren't commonly using Rosetta@home to predict the structure for proteins where we already know the structure (the closest we come to this is CASP).

However, we are heavily utilizing Rosetta@home to validate and test de novo designed proteins. In these cases, we've designed a totally new structure and protein sequence that has never existed before in nature. Then, we take the amino acid sequence which codes for our de novo protein and ask Rosetta@home how this sequence will fold. Finally, we compare each Rosetta@home model to our designed model and ask how well they match up, which I think is what you're seeing as a "known native solution." In reality, this isn't a "known" structure, rather it's one we're attempting to make. Evaluation with Rosetta@home is currently the most important verification criterion we use to assess the quality of our de novo designed proteins prior to testing in the lab.

As always, many thanks for donating your CPU hours; you make de novo protein design possible!

Cheers,
-Chris


Ah thx Chris, that makes sense! Yes in that case it is the "known" solution that is confusing. Thx again for clarifying

Mark
ID: 77626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Folding of known structures



©2024 University of Washington
https://www.bakerlab.org