Message boards : Rosetta@home Science : "Progress Percentage" Counts Up in Painfullly Long Segments...
Author | Message |
---|---|
Michael Kiewicz Send message Joined: 3 Apr 06 Posts: 2 Credit: 4,759,358 RAC: 0 |
Why is it that the "Progress Percentage" Counter Updates in Large Chunks??? All of the other Projects, "Progress" Meter's follow along with the CPU Time??? PROBLEM IS: If Crunching Stop's during a period betweeen Updates.... The CPU Time goes back to the last saved point... Thereby, Wasting MY CPU Time... What Give's??? Waiting to see my Progress Count Up... Aloha, Sad Rosetta Cruncher |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
This is probably better explained in either the FAQ or one of the stickied posts for our newer arrivals. But here goes: The client posts a number like 1.042% done. The 1.0 is the claimed percent done; the 42 is a step number. Once it actually finishes a model/decoy it figures out how long it took. (5-15 minutes for small proteins, several hours for large proteins.) If you use the default time of 3 hour max for the work units, and it took 1 hour to finish your first model, then it'll switch from a low value like 1.xxx% to 33%. It was worse.. prior to a troubleshooting step, we didn't have the steps and changes happened even less frequently. :) As for saved points - as soon as it reaches certain steps in the code, it's supposed to check and see if it's been more than 20 mins since the last saved point, and if so, save. Perhaps you could ask for the client to have a reading for when the last checkpoint was made, so you'll know how safe it is to shut the system down. (i.e. 35 mins since last checkpoint.) If we make suggestions on ways of getting around problems, the programmers will often incorporate them, or come up with an even better solution. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,516,305 RAC: 9,550 |
I think that would be a good thing to include so a utility could perform a 'shutdown after next save point' - I remember RM included this into UDMon back in the days of UD-Think! |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
For the record, there are other projects that are worse than Rosetta in some of these respects. Some have multi-hour work units which do no checkpointing at all. Others have multi-month WUs, and so 1000+ hours of runtime, and their estimates count up much of the time as well. But I think we all agree, visible forward motion is the expectation, and checkpointing frequently enough that we don't lose work, but not so frequently that we lose work (by spending all the time taking checkpoints). The project recently made some great enhancements to the checkpoints as Benny described, but the huge proteins from CASP basically stress even the new checkpointing scheme to the point that it is still possible to lose an hour of crunching. It's kind of like following a complex set of recipies in the kitchen. You've got 6 dishes in progress at the same time. You've got every burner on the stove going, and there's laundry in the washer, a turkey in the oven, and the phone is ringing... and someone says "let's go get some fresh air" (i.e. "please take a checkpoint NOW")... you just can't do it. And if you tried, it would cost you more time in the long run. On the other hand... if the power goes out... you find a way to recover (from the last checkpoint). If I cut power to your kitchen just when the bread needs to start baking... or else the yeast will spoil it; your "recovery" of the bread is to start making more dough (i.e. you lost work). If you adopt a system of freezing dough, you can "recover" quicker. This is a good point in the process to take a "checkpoint", because you can recover simply by warming it up again, so recovery time isn't bad, and much the work is preserved. Rosetta is the same way (actually any computer processing), there are some places where checkpointing is feasible and easy (like the dough stage) and other places where it's not... like after the dough rises. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 137 |
There was some talk about the new scheduler in this thread. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Message boards :
Rosetta@home Science :
"Progress Percentage" Counts Up in Painfullly Long Segments...
©2024 University of Washington
https://www.bakerlab.org