Posts by Mod.Sense

1) Message boards : Rosetta@home Science : Besides "de novo protein design" are we doing also a lot of benchmarks? (Message 98426)
Posted 9 days ago by Mod.Sense
While I have no knowledge of the ratios, you are definitely asking the right questions, and seeing the project in the right light. There are numerous groups in RosettaCommons that are working to "improve" the Rosetta algorithms. I put "improve" in quotes, because you have no way to know if your improvements work properly unless you test something using the original algorithm, and also using the new algorithm. If the result if that the new algorithm arrives at the same or more accurate answer using less models (less overall compute time), then it is indeed an "improvement", at least for the type of protein of your test.

The project is constantly asking the question "How could we have arrived at that answer, better?". Where "that answer" is perhaps from x-ray crystallography, or from running the latest R@h prediction. To answer that question, you constantly have to try new approaches against entire libraries of known structures. Your new approach might be better for some unique subset of the protein library you use for comparisons. Such as those that involve zinc, or have a hairpin curve, or are symmetric, or a barrel structure. Can we improve the predictions of barrel structures without harming our predictions of proteins that have hairpin curves? Until you study the predictions of your new algorithm against the various classes of proteins, you cannot fully understand what its predictions will look like. Then you ask things like, "can we retain the better hairpin curve predictions, without harming our zinc predictions?"

So, yes, R@h is really a project that is about making a better protein analysis tool, so that the tool is refined and working well when something like COVID-19 comes up. They also work with other researchers to test specific protein interactions, such as interactions with HIV, or malaria, or SARS-CoV-2.
2) Message boards : Rosetta@home Science : How large is the problem? (Message 98375)
Posted 17 days ago by Mod.Sense
Is it fair to think that basically defines the universe of possibilities for the COVID-19 protein?

Actually no, because COVID has a lot more than 300 amino acids. In fact, I believe each of the many "spikes" on the outer surface has more than that.

Modeling the virus is just the first step. Next you have many thousands of other things that might stick well to it. So you have the interactions between COVID and other proteins to study. And some of those proteins may be newly created in the lab (i.e. "novel"), and there is a similarly limitless universe of possible new proteins that could possibly stick to the COVID spikes, which would probably prevent it from invading cells, and thus infecting more cells and multiplying its numbers. So you start to get a sense that just the study of COVID is really really huge, and then you realize there are thousands of other things to study. This is why R@h has been studying proteins for over two decades and still isn't "finished".
3) Message boards : Rosetta@home Science : How large is the problem? (Message 98350)
Posted 19 days ago by Mod.Sense
The R@h graphic shows gyration of the protein, but this is just showing various configurations it is examining. The protein in nature will take a specific shape while being formed and retain that shape until something else comes and interacts with it.

To extend my weather computer model analogy to having satellite imagery is sort of what you get when you have initial x-ray crystallography data available for the protein you want to study. The data doesn't clearly identify which parts of the structure are where, but it gives you a fuzzy idea. Sort of like looking at stars in the sky when the telescope isn't focused quite right. So R@h can use this fuzzy data as input to the model and reach a predicted structure more quickly and generally with more accuracy than if the x-ray data were not available.
4) Message boards : News : Rosetta's role in fighting coronavirus (Message 98349)
Posted 19 days ago by Mod.Sense
I would suggest that the project is sending WUs out that they feel are most important, and trust that they are weighing the priorities appropriately. If there is non-COVID work that needs to be done as well, then your machine completing it means someone else's can stay focused on COVID.
5) Message boards : Rosetta@home Science : How large is the problem? (Message 98330)
Posted 20 days ago by Mod.Sense
I love Brian's answer.

To offer an analogy, you might consider how much computing power it takes to predict the weather. Each day the weather that exists presents a new set of data, you use what you've learned about predicting weather, along with computing resources and you generate further predictions. You rerun weather data from last year against your newly revised models and see if the models now better predict what indeed happened in the historical data. Then you realize that you might be able to apply what you've learned to more generalized climate prediction, and you work to extend your models to apply to days and hours, as well as decades and centuries. You also start to notice that while your day-ahead predictions are reasonably good, you still can't accurately predict where the hurricane is heading, and you make further revisions to your model, some of which actually help improve your general 48-hour predictions... and on it goes.

Rosetta is primarily an effort to learn how to create accurate predictions. As has been said about modelling, the model will never perfectly predict the reality, the question is whether the model if close enough to be useful. In the case of proteins, their structures must be based in physics. So it would seem predictions that are very close to reality should be achievable. In many cases they are achievable today. However, there are always those hurricanes that come up.
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 98249)
Posted 25 days ago by Mod.Sense
So far as validating results, lost results etc. Each protein study fires off thousands of tasks. Some 5% or less of those results will look to be the best. If a task ran astray out in the wild, and mistakenly reports a terrible result, that's not ideal, but there should still be a similar result in those top 5%. If the task ran astray and mistakenly reports a fantastic result, that single model is rerun in the lab and confirmed. If the lab system has the same flaw, it should get the same fantastic result. But there is also human review of the results. Sometimes you can tell, just by the shape of the result, that it doesn't look like a protein found in nature.

If a protein-protein interaction were being studied, it might be more difficult to tell that something is off just by the shape. Eventually results may by sent to the "wet lab" where they produce the two proteins and see if they actually interact as predicted by the model.

If the protein structure has already been determined, the models are compared to the known structure and the degree of their similarity is measured in RSMD.

Sometimes the human review of the top 5% of the results concludes that we still have not found the best model. Perhaps there is a high variability in appearance across the top scoring models. In such cases, variations of those top 5% of the results are sent out as a new round of work. It is for the same protein, and again will do thousands of models, but these will start with some assumptions or rules that cause you to begin with something much closer to one of those previous best results, and search around that same area for a better (lower energy) result.

I made up the 5% number. 1% or less is probably more realistic. Maybe I should have said something like "...the top 10 or 20 models".

Anyway, I hope that makes it more clear why R@h does not require a wingman to rerun the same models to confirm results. When you get down to those top 10 results, they should all look pretty similar. Each arrived at that model from a different start, but, in the end, the top results should all be similar to the actual protein's structure in nature. So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb.
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 98117)
Posted 15 Jul 2020 by Mod.Sense
The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count?

You are mistaken about the first decoy. The first decoy is a legit, full model of the protein, not a simple test of the environment.
8) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 98055)
Posted 13 Jul 2020 by Mod.Sense
Hopefully the programmers of nuclear power plant control systems can presume there is an active window with a person viewing that can respond to a prompt.
9) Message boards : Cafe Rosetta : Personal Milestones (Message 98054)
Posted 13 Jul 2020 by Mod.Sense
If it had any genuine promise... there would be scores of universities and research labs throughout the world using it. (click where it says "click to show the list of institutions").
10) Message boards : Cafe Rosetta : Personal Milestones (Message 98012)
Posted 11 Jul 2020 by Mod.Sense
Rosetta crosses one Petaflop of computational power. A Petaflop is 1,000 Teraflops. A Teraflop is one trillion floating-point operations per second.
11) Message boards : News : Rosetta's role in fighting coronavirus (Message 98011)
Posted 11 Jul 2020 by Mod.Sense
A group called Science To Save the World has created a great video that explains how Rosetta is used to investigate COVID-19, and to model the spike protein before laboratory techniques of structure discovery could be completed.
12) Message boards : Cafe Rosetta : Open Access and Ethics question (Message 98006)
Posted 11 Jul 2020 by Mod.Sense
Where are these public domain results?
Who will own the results of rosetta@home?
Will someone (i.e. BigPharma) make money out of my CPU time?
Solid answer needed, for who do we do this?

David Baker's Ted Talk on the big picture of their research
13) Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why? (Message 97987)
Posted 9 Jul 2020 by Mod.Sense
The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority.
14) Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why? (Message 97965)
Posted 8 Jul 2020 by Mod.Sense
I agree, you should be able to complete some models. Just not as many per unit time.

Are the tasks actually getting CPU time? Or are they just shown as "running" for their status. Normally I would tell a Windows user to go to task manager to see if they are using CPU, and to confirm with the WU properties the actual CPU time is increasing. On Linux, I guess have a look at top, and see if the tasks are getting CPU. Although you did say they ran 100% CPU. So it sounds like you've already checked this.

There is nothing about an older CPU that should impact checkpointing either. As you allude to, a checkpoint is taken at least at the end of each model, and often at various times within a model as well. What have you set for your preference on how often to request tasks to checkpoint?

It is always possible that the first model of the WU happened to go rogue. How did the next machine do with the reassigned tasks?
15) Questions and Answers : Windows : How do I exit the program properly? (Message 97950)
Posted 7 Jul 2020 by Mod.Sense
"close" of BOINC Manager just closes the GUI you use to see what is running. "exit" of BOINC Manager ends the program and all of the tasks it is running. This is true for all BOINC projects.
16) Message boards : Cafe Rosetta : Are we running out of work? (Message 97645)
Posted 26 Jun 2020 by Mod.Sense
So far as the supply of power on the grid, and any air conditioning costs, you might consider the opposite, and only run at night.

Once new work is available, it will take a couple of days for everything to get sent out and settle down.
17) Message boards : Cafe Rosetta : something broke (Message 97644)
Posted 26 Jun 2020 by Mod.Sense
Presently, there are no R@h work units available for download. More coming soon (as is always the case).
18) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 97423)
Posted 16 Jun 2020 by Mod.Sense
Yawn.... move it then.

Please heed the advice to carry on the conversation where more appropriate. All of it is off topic, and it is easier to delete than to move. Far better for all to simply do more than yawn in the first place.
19) Message boards : News : Rosetta Graphics released for Windows (Message 97365)
Posted 13 Jun 2020 by Mod.Sense
When the conformation of a given protein is known, it is very useful to see if your computational algorithms can come up with the same answer. Because if they don't, you've got more work to do.

In very general terms, any time you come up with a computer model of something that occurs in nature, you want to confirm your model by blindly applying it to a natural occurrence to see if it predicts the outcome you actually observe in nature. If your model does not predict what actually occurs, then there is room to improve the model.

For example, if you have a computer model that predicts changes to world climate, you want to enter all of the data that you have for the year 1900 and see if you accurately predict what was observed in 1910 and 1920.

Rosetta is really about developing the best computational model of how proteins work.
20) Message boards : Cafe Rosetta : Personal Milestones (Message 97345)
Posted 12 Jun 2020 by Mod.Sense
Rosetta crosses the 100 Billion credits issued mark!

Next 20

©2020 University of Washington