How large is the problem?

Message boards : Rosetta@home Science : How large is the problem?

To post messages, you must log in.

AuthorMessage
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98317 - Posted: 25 Jul 2020, 0:33:44 UTC

I have a basic question that I'm having difficulty finding the answer. But can someone give, in layman's terms, how large of a computational problem this is? (Is it even finite?) Thanks in advance!
ID: 98317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98323 - Posted: 25 Jul 2020, 9:46:58 UTC - in response to Message 98317.  

Rosetta@home is not a computational problem to be solved. It is a project that uses computing power to advance knowledge in the biological sciences.

The tasks we are running are asking whether a given set of building blocks, and a method of predicting how they will assemble themselves, can yield a useful protein. Given the unfathomably large number of possibilities, in the vast majority of cases the answer will be “no”. Just maybe there’ll be a result that warrants further investigation – but the broader goal is to refine the prediction methods, which will make it easier in the future to design proteins that will be effective in fighting disease.

We’ll be finished once there is nothing left to learn…
ID: 98323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98327 - Posted: 25 Jul 2020, 12:04:20 UTC - in response to Message 98323.  
Last modified: 25 Jul 2020, 12:07:35 UTC

Brian thank you for the reply and the interesting article. What you sent was the first reference I've seen that actually had a number. Indeed an astronomically large number and a moving target.
ID: 98327 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 98330 - Posted: 25 Jul 2020, 15:01:37 UTC

I love Brian's answer.

To offer an analogy, you might consider how much computing power it takes to predict the weather. Each day the weather that exists presents a new set of data, you use what you've learned about predicting weather, along with computing resources and you generate further predictions. You rerun weather data from last year against your newly revised models and see if the models now better predict what indeed happened in the historical data. Then you realize that you might be able to apply what you've learned to more generalized climate prediction, and you work to extend your models to apply to days and hours, as well as decades and centuries. You also start to notice that while your day-ahead predictions are reasonably good, you still can't accurately predict where the hurricane is heading, and you make further revisions to your model, some of which actually help improve your general 48-hour predictions... and on it goes.

Rosetta is primarily an effort to learn how to create accurate predictions. As has been said about modelling, the model will never perfectly predict the reality, the question is whether the model if close enough to be useful. In the case of proteins, their structures must be based in physics. So it would seem predictions that are very close to reality should be achievable. In many cases they are achievable today. However, there are always those hurricanes that come up.
Rosetta Moderator: Mod.Sense
ID: 98330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98341 - Posted: 26 Jul 2020, 5:20:07 UTC - in response to Message 98330.  

So if we had hypothetical "satellite imagery" of the protein spike insides on a COVD-19 virus would we see it folding and refolding like a storm gyrating and growing/shrinking? So the idea isn't to analyze the geometry at any one time but to find a model that can predict the patterns that are most likely (the ones with the lowest energy states)? Do I have that right?
ID: 98341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,498
RAC: 6,467
Message 98345 - Posted: 26 Jul 2020, 12:05:03 UTC - in response to Message 98317.  

I have a basic question that I'm having difficulty finding the answer. But can someone give, in layman's terms, how large of a computational problem this is? (Is it even finite?) Thanks in advance!


How much stuff can you fit in your home would be a similar question...the answer depends on the size of the stuff and how well you pack it...Rosetta is trying to find better ways of doing things to solve problems, there is an end to each problem but not the number of problems it can work on. The end to each problem is when they find the most efficient way of doing things and each problem has a different way because of what they are working on, ie if they were working on rubix cubes that answer is a finite one because of the way the cube works but since Rosetta is doing work on chemistry it isn't as easy especially if like Covid-19 it mutates. So if you find the way to solve the problem to the 10 generations ago version it may not be the right answer for this generations version, BUT in theory it should give you a faster way of looking at the answer to the latest generation because of all the ideas you've figured out that don't work.

For example you want 'better' roses in your garden and you start out by just watering them every day...nope not 'better' roses, okay let's add coffee grounds...nope not 'better roses, let's try ashes from the fire place...nope not 'better' roses, how about fertilizer...EUREKA 'better' roses. It takes time and sometimes even some luck to find the 'best' answer but in the end it WILL be found if we don't stop trying. If you look at the Rosetta home page it shows other Projects that are on hold as we do the Covid-19 research right now, there are always questions so NO Rosetta will never end as long as they have the will and the funding to keep it going.
ID: 98345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 98348 - Posted: 26 Jul 2020, 14:51:30 UTC - in response to Message 98341.  
Last modified: 26 Jul 2020, 14:52:29 UTC

So if we had hypothetical "satellite imagery" of the protein spike insides on a COVD-19 virus would we see it folding and refolding like a storm gyrating and growing/shrinking? So the idea isn't to analyze the geometry at any one time but to find a model that can predict the patterns that are most likely (the ones with the lowest energy states)? Do I have that right?

Not really to that extent I don't think - my understanding is this:
what is known is that a protein will typically fold into its lowest energy state. In that is it will collapse into the shape that releases the most energy/heat possible. Rosetta calculates the energy state for a given folded shape, and is able to score that shape by how much energy it would take break all of the bonds that are created by folding into in that shape. The more energy required to break those bonds and unfold it, the lower the energy of the shape, hence "lowest energy" being the part of the scoring function.

So while Rosetta might be twisting and turning parts of the protein to find the lowest energy, in reality the protein will be relatively static once it has been created and had folded initially. Having said that, it's probably a huge oversimplification and the protein may adjust it's shape whenever interacting with other stuff, but probably not to the extent as shown in the Rosetta graphics. Enzymes (which are active proteins rather than structural proteins) do have parts that move - they're like little machines. I would guess the spikes actively interact with some part of the cell they're infecting, do would guess there's some enzyme your movement during that process. Would be great so see a simulation of that...
ID: 98348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 98350 - Posted: 26 Jul 2020, 16:28:34 UTC - in response to Message 98341.  

The R@h graphic shows gyration of the protein, but this is just showing various configurations it is examining. The protein in nature will take a specific shape while being formed and retain that shape until something else comes and interacts with it.

To extend my weather computer model analogy to having satellite imagery is sort of what you get when you have initial x-ray crystallography data available for the protein you want to study. The data doesn't clearly identify which parts of the structure are where, but it gives you a fuzzy idea. Sort of like looking at stars in the sky when the telescope isn't focused quite right. So R@h can use this fuzzy data as input to the model and reach a predicted structure more quickly and generally with more accuracy than if the x-ray data were not available.
Rosetta Moderator: Mod.Sense
ID: 98350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98352 - Posted: 27 Jul 2020, 1:12:20 UTC - in response to Message 98350.  

Ok so it is stable. To me that indicates there is an "answer" to the question that is "What is the structure of the COVID-19 protein?" that has some meaningful persistence in time (as in enough time for a treatment to be developed and applied). But given we have no way of peeking inside of the protein we won't know for sure. We only have best guesses based on what it looks like from the x-ray crystallograph and energy states. Am I good so far?
ID: 98352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 98356 - Posted: 27 Jul 2020, 20:17:17 UTC - in response to Message 98352.  

Yeah I think that's pretty accurate.
ID: 98356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98357 - Posted: 27 Jul 2020, 22:15:46 UTC - in response to Message 98345.  

Sorry I think my question was too vague, I came to this project through the COVID-19 door so my scope was naively limited to that. Specifically I was referring to the challenge of discovering the COVID-19 protein structure.

The Levinthal's Paradox wikipedia article mentions the possibility of 3^300 conformations. Is it fair to think that basically defines the universe of possibilities for the COVID-19 protein?
ID: 98357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 98375 - Posted: 28 Jul 2020, 20:35:54 UTC - in response to Message 98357.  

Is it fair to think that basically defines the universe of possibilities for the COVID-19 protein?


Actually no, because COVID has a lot more than 300 amino acids. In fact, I believe each of the many "spikes" on the outer surface has more than that.

Modeling the virus is just the first step. Next you have many thousands of other things that might stick well to it. So you have the interactions between COVID and other proteins to study. And some of those proteins may be newly created in the lab (i.e. "novel"), and there is a similarly limitless universe of possible new proteins that could possibly stick to the COVID spikes, which would probably prevent it from invading cells, and thus infecting more cells and multiplying its numbers. So you start to get a sense that just the study of COVID is really really huge, and then you realize there are thousands of other things to study. This is why R@h has been studying proteins for over two decades and still isn't "finished".
Rosetta Moderator: Mod.Sense
ID: 98375 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tim55c

Send message
Joined: 26 Apr 20
Posts: 6
Credit: 413,592
RAC: 0
Message 98377 - Posted: 29 Jul 2020, 0:05:11 UTC - in response to Message 98375.  


Modeling the virus is just the first step. Next you have many thousands of other things that might stick well to it. So you have the interactions between COVID and other proteins to study. And some of those proteins may be newly created in the lab (i.e. "novel"), and there is a similarly limitless universe of possible new proteins that could possibly stick to the COVID spikes, which would probably prevent it from invading cells, and thus infecting more cells and multiplying its numbers. So you start to get a sense that just the study of COVID is really really huge, and then you realize there are thousands of other things to study. This is why R@h has been studying proteins for over two decades and still isn't "finished".


Yes, I agree R@h is a very useful tool and I'm hoping to work on a proposal to make it even more useful (hopefully). My original question was too vague, I apologize. I was attempting to ask for some way of quantifying how large of a computational problem it is to identify the "COVID-19", (I guess technically I should be using the term SARS-CoV-2 virus) protein structure. My background is in computer science/mathematics and not Biology (that part probably was self evident :) .

Just came across this article that discusses the glycosylation of the protein that masks its identity from our immune system. (I guess nature uses sugar coating to deliver bad news too.)

https://news.uga.edu/searching-the-covid-19-spike-protein-for-a-potential-vaccine/
ID: 98377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,644,940
RAC: 271
Message 98391 - Posted: 31 Jul 2020, 6:46:26 UTC
Last modified: 31 Jul 2020, 6:47:51 UTC

Very interesting thread. As someone who works in data science, I appreciate first hand that we are still very resource-constrained in terms of computational throughput, in general as a species. Protein folding is the type of problem that could literally use every single computer and chip on the planet and still not have enough resources to be a 'perfect solution finder'. Hence, there are many shortcuts and approximations and tricks that are used to try to get a 'best guess' with the amount of compute available.

Someday, hopefully we will evolve beyond this resource scarcity via either transformational developments in quantum computing (perhaps soon, ie. within the next couple of decades) or on a longer time horizon we will move up the Kardashev scale. In the meantime, it would be nice if a larger percentage of people would take interest in contributing to efforts like Rosetta@Home to try to maximize the resource availability for the advancement of human knowledge.
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 98391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 98395 - Posted: 31 Jul 2020, 14:37:29 UTC - in response to Message 98391.  

Someday, hopefully we will evolve beyond this resource scarcity via either transformational developments in quantum computing (perhaps soon, ie. within the next couple of decades) or on a longer time horizon we will move up the Kardashev scale.

They will speed it up somehow, probably with AI techniques, or increased parallelism that allows the use of GPUs. In the meantime, I will humor them until the end of the year due to the importance of COVID-19.
After that, I can reduce the number of machines on Rosetta to a more normal level and find other interesting uses for them. This is not the only way to cure diseases.
ID: 98395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1993
Credit: 38,553,309
RAC: 15,833
Message 98422 - Posted: 4 Aug 2020, 23:51:59 UTC

Great thread
ID: 98422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : How large is the problem?



©2024 University of Washington
https://www.bakerlab.org