Posts by Mod.Sense

1) Message boards : Number crunching : 3832 new hosts per day? (Message 100020)
Posted 14 Dec 2020 by Mod.Sense
Post:
If a two hour task completes 10 models, then an eight hour task is more likely to complete about 40 models, not 15.

Note that not all tasks can complete in two hours. With such a short runtime preference, you are more likely to see tasks running longer than the preference. When you look at credits, you really must consider the amount of actual CPU time, not the number of work units, and not just the runtime preference.

There are no "missing results". So, set your preferences in a way that works for you and your machine.

If you use Dr. Baker's analogy of exploring a planet's surface for the highest or lowest elevation on the planet, then each model is one of the explorers. They start their exploration from a random point on the planet. When a work unit has enough time to begin another model, that next model will be started at another random point on the planet, with no regard to the first model or what it found. If you drop 10,000 explorers on the planet, your success in finding the true highest or lowest elevation would essentially be proportional to the surface area of the planet. If 10,000 explorers is adequate for Mars, you might need 100,000 for Saturn. So, when they feel they have a Saturn-sized protein for study, they might create more work units. But, as you point out, they have no way to predict exactly how many models will result. If they approach the end of work units coming back in and still only have 80,000 results, then they create more work units to obtain the 100,000 results desired.

Having said that, once they see the results, they can sometimes give hints to future explorers, or essentially drop more of them near the Himalayas. So they might create a secondary batch of work units, which are designed to concentrate the focus based on what was learned on the first round.
2) Questions and Answers : Unix/Linux : Can't get started (Message 100016)
Posted 14 Dec 2020 by Mod.Sense
Post:
I believe you would have to use the BOINC command interface to atach a project.

Your question will get more views if you post to the Number Crunching board, where there are also some rPi threads.
3) Questions and Answers : Unix/Linux : Overclocking on Raspberry Pi 4 (Message 99224)
Posted 3 Oct 2020 by Mod.Sense
Post:
R@h works differently than many other BOINC projects. There is a user preference for how long you would prefer the tasks to run. The setting is more of an objective than a hard rule.

You should look to compare credit per CPU hour. Credit is your best indication of how much work your machine completed.

Basically what is happening is that your machine is now completing more models in the 8 hours than it would without the overclock.

The Number Crunching board would be best place to post additional performance questions.
4) Message boards : Number crunching : 0 new tasks, Rosetta? (Message 98934)
Posted 9 Sep 2020 by Mod.Sense
Post:

I have "leave application in memory while suspended" unticked. Does that screw up the above?

I set it that way due to a shortage of RAM.


Yes, the statement that when it can't complete it keeps it in memory is only true if you have that setting checkmarked.

If a task is kept "in memory", but is not running, it will get paged out to the swap file, so long as swap space is available.
5) Message boards : Rosetta@home Science : Besides "de novo protein design" are we doing also a lot of benchmarks? (Message 98426)
Posted 5 Aug 2020 by Mod.Sense
Post:
While I have no knowledge of the ratios, you are definitely asking the right questions, and seeing the project in the right light. There are numerous groups in RosettaCommons that are working to "improve" the Rosetta algorithms. I put "improve" in quotes, because you have no way to know if your improvements work properly unless you test something using the original algorithm, and also using the new algorithm. If the result if that the new algorithm arrives at the same or more accurate answer using less models (less overall compute time), then it is indeed an "improvement", at least for the type of protein of your test.

The project is constantly asking the question "How could we have arrived at that answer, better?". Where "that answer" is perhaps from x-ray crystallography, or from running the latest R@h prediction. To answer that question, you constantly have to try new approaches against entire libraries of known structures. Your new approach might be better for some unique subset of the protein library you use for comparisons. Such as those that involve zinc, or have a hairpin curve, or are symmetric, or a barrel structure. Can we improve the predictions of barrel structures without harming our predictions of proteins that have hairpin curves? Until you study the predictions of your new algorithm against the various classes of proteins, you cannot fully understand what its predictions will look like. Then you ask things like, "can we retain the better hairpin curve predictions, without harming our zinc predictions?"

So, yes, R@h is really a project that is about making a better protein analysis tool, so that the tool is refined and working well when something like COVID-19 comes up. They also work with other researchers to test specific protein interactions, such as interactions with HIV, or malaria, or SARS-CoV-2.
6) Message boards : Rosetta@home Science : How large is the problem? (Message 98375)
Posted 28 Jul 2020 by Mod.Sense
Post:
Is it fair to think that basically defines the universe of possibilities for the COVID-19 protein?


Actually no, because COVID has a lot more than 300 amino acids. In fact, I believe each of the many "spikes" on the outer surface has more than that.

Modeling the virus is just the first step. Next you have many thousands of other things that might stick well to it. So you have the interactions between COVID and other proteins to study. And some of those proteins may be newly created in the lab (i.e. "novel"), and there is a similarly limitless universe of possible new proteins that could possibly stick to the COVID spikes, which would probably prevent it from invading cells, and thus infecting more cells and multiplying its numbers. So you start to get a sense that just the study of COVID is really really huge, and then you realize there are thousands of other things to study. This is why R@h has been studying proteins for over two decades and still isn't "finished".
7) Message boards : Rosetta@home Science : How large is the problem? (Message 98350)
Posted 26 Jul 2020 by Mod.Sense
Post:
The R@h graphic shows gyration of the protein, but this is just showing various configurations it is examining. The protein in nature will take a specific shape while being formed and retain that shape until something else comes and interacts with it.

To extend my weather computer model analogy to having satellite imagery is sort of what you get when you have initial x-ray crystallography data available for the protein you want to study. The data doesn't clearly identify which parts of the structure are where, but it gives you a fuzzy idea. Sort of like looking at stars in the sky when the telescope isn't focused quite right. So R@h can use this fuzzy data as input to the model and reach a predicted structure more quickly and generally with more accuracy than if the x-ray data were not available.
8) Message boards : News : Rosetta's role in fighting coronavirus (Message 98349)
Posted 26 Jul 2020 by Mod.Sense
Post:
I would suggest that the project is sending WUs out that they feel are most important, and trust that they are weighing the priorities appropriately. If there is non-COVID work that needs to be done as well, then your machine completing it means someone else's can stay focused on COVID.
9) Message boards : Rosetta@home Science : How large is the problem? (Message 98330)
Posted 25 Jul 2020 by Mod.Sense
Post:
I love Brian's answer.

To offer an analogy, you might consider how much computing power it takes to predict the weather. Each day the weather that exists presents a new set of data, you use what you've learned about predicting weather, along with computing resources and you generate further predictions. You rerun weather data from last year against your newly revised models and see if the models now better predict what indeed happened in the historical data. Then you realize that you might be able to apply what you've learned to more generalized climate prediction, and you work to extend your models to apply to days and hours, as well as decades and centuries. You also start to notice that while your day-ahead predictions are reasonably good, you still can't accurately predict where the hurricane is heading, and you make further revisions to your model, some of which actually help improve your general 48-hour predictions... and on it goes.

Rosetta is primarily an effort to learn how to create accurate predictions. As has been said about modelling, the model will never perfectly predict the reality, the question is whether the model if close enough to be useful. In the case of proteins, their structures must be based in physics. So it would seem predictions that are very close to reality should be achievable. In many cases they are achievable today. However, there are always those hurricanes that come up.
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 98249)
Posted 20 Jul 2020 by Mod.Sense
Post:
So far as validating results, lost results etc. Each protein study fires off thousands of tasks. Some 5% or less of those results will look to be the best. If a task ran astray out in the wild, and mistakenly reports a terrible result, that's not ideal, but there should still be a similar result in those top 5%. If the task ran astray and mistakenly reports a fantastic result, that single model is rerun in the lab and confirmed. If the lab system has the same flaw, it should get the same fantastic result. But there is also human review of the results. Sometimes you can tell, just by the shape of the result, that it doesn't look like a protein found in nature.

If a protein-protein interaction were being studied, it might be more difficult to tell that something is off just by the shape. Eventually results may by sent to the "wet lab" where they produce the two proteins and see if they actually interact as predicted by the model.

If the protein structure has already been determined, the models are compared to the known structure and the degree of their similarity is measured in RSMD.

Sometimes the human review of the top 5% of the results concludes that we still have not found the best model. Perhaps there is a high variability in appearance across the top scoring models. In such cases, variations of those top 5% of the results are sent out as a new round of work. It is for the same protein, and again will do thousands of models, but these will start with some assumptions or rules that cause you to begin with something much closer to one of those previous best results, and search around that same area for a better (lower energy) result.

I made up the 5% number. 1% or less is probably more realistic. Maybe I should have said something like "...the top 10 or 20 models".

Anyway, I hope that makes it more clear why R@h does not require a wingman to rerun the same models to confirm results. When you get down to those top 10 results, they should all look pretty similar. Each arrived at that model from a different start, but, in the end, the top results should all be similar to the actual protein's structure in nature. So, they should all be very similar. So if the 11th top result looks radically different due to some error, it will stand out like a sore thumb.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 98117)
Posted 15 Jul 2020 by Mod.Sense
Post:
The rgmjp tasks appear to complete only one decoy. The first decoy is usually only a quick check to make sure that your computer is running properly, so does this mean that the usual first decoy is skipped for these, or does it mean that more decoys are done but without adding them to the decoy count?


You are mistaken about the first decoy. The first decoy is a legit, full model of the protein, not a simple test of the environment.
12) Message boards : Number crunching : Peer certificate cannot be authenticated with given CA certificates (Message 98055)
Posted 13 Jul 2020 by Mod.Sense
Post:
Hopefully the programmers of nuclear power plant control systems can presume there is an active window with a person viewing that can respond to a prompt.
13) Message boards : Cafe Rosetta : Personal Milestones (Message 98054)
Posted 13 Jul 2020 by Mod.Sense
Post:
If it had any genuine promise... there would be scores of universities and research labs throughout the world using it. (click where it says "click to show the list of institutions").
14) Message boards : Cafe Rosetta : Personal Milestones (Message 98012)
Posted 11 Jul 2020 by Mod.Sense
Post:
Rosetta crosses one Petaflop of computational power. A Petaflop is 1,000 Teraflops. A Teraflop is one trillion floating-point operations per second.
15) Message boards : News : Rosetta's role in fighting coronavirus (Message 98011)
Posted 11 Jul 2020 by Mod.Sense
Post:
A group called Science To Save the World has created a great video that explains how Rosetta is used to investigate COVID-19, and to model the spike protein before laboratory techniques of structure discovery could be completed.
https://www.youtube.com/watch?v=HBq3izp5X-I
16) Message boards : Cafe Rosetta : Open Access and Ethics question (Message 98006)
Posted 11 Jul 2020 by Mod.Sense
Post:
Where are these public domain results?
Who will own the results of rosetta@home?
Will someone (i.e. BigPharma) make money out of my CPU time?
Solid answer needed, for who do we do this?

David Baker's Ted Talk on the big picture of their research
17) Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why? (Message 97987)
Posted 9 Jul 2020 by Mod.Sense
Post:
The question is whether the CPU was busy actually working on R@h tasks, or whether there was something else consuming the CPU. If something else, then R@h will not be able to make progress, as it runs at the lowest priority.
18) Message boards : Number crunching : No decoys on an old Pentium 4. Just a curiosity: why? (Message 97965)
Posted 8 Jul 2020 by Mod.Sense
Post:
I agree, you should be able to complete some models. Just not as many per unit time.

Are the tasks actually getting CPU time? Or are they just shown as "running" for their status. Normally I would tell a Windows user to go to task manager to see if they are using CPU, and to confirm with the WU properties the actual CPU time is increasing. On Linux, I guess have a look at top, and see if the tasks are getting CPU. Although you did say they ran 100% CPU. So it sounds like you've already checked this.

There is nothing about an older CPU that should impact checkpointing either. As you allude to, a checkpoint is taken at least at the end of each model, and often at various times within a model as well. What have you set for your preference on how often to request tasks to checkpoint?

It is always possible that the first model of the WU happened to go rogue. How did the next machine do with the reassigned tasks?
19) Questions and Answers : Windows : How do I exit the program properly? (Message 97950)
Posted 7 Jul 2020 by Mod.Sense
Post:
"close" of BOINC Manager just closes the GUI you use to see what is running. "exit" of BOINC Manager ends the program and all of the tasks it is running. This is true for all BOINC projects.
20) Message boards : Cafe Rosetta : Are we running out of work? (Message 97645)
Posted 26 Jun 2020 by Mod.Sense
Post:
So far as the supply of power on the grid, and any air conditioning costs, you might consider the opposite, and only run at night.

Once new work is available, it will take a couple of days for everything to get sent out and settle down.


Next 20



©2021 University of Washington
https://www.bakerlab.org