"Progress Percentage" Counts Up in Painfullly Long Segments...

Message boards : Rosetta@home Science : "Progress Percentage" Counts Up in Painfullly Long Segments...

To post messages, you must log in.

AuthorMessage
Michael Kiewicz

Send message
Joined: 3 Apr 06
Posts: 2
Credit: 4,759,358
RAC: 0
Message 19809 - Posted: 6 Jul 2006, 0:05:53 UTC

Why is it that the "Progress Percentage" Counter Updates in Large Chunks???

All of the other Projects, "Progress" Meter's follow along with the CPU Time???

PROBLEM IS: If Crunching Stop's during a period betweeen Updates.... The CPU Time goes back to the last saved point...

Thereby, Wasting MY CPU Time... What Give's???

Waiting to see my Progress Count Up...

Aloha,

Sad Rosetta Cruncher
ID: 19809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 19811 - Posted: 6 Jul 2006, 1:15:31 UTC

This is probably better explained in either the FAQ or one of the stickied posts for our newer arrivals. But here goes:

The client posts a number like 1.042% done. The 1.0 is the claimed percent done; the 42 is a step number. Once it actually finishes a model/decoy it figures out how long it took. (5-15 minutes for small proteins, several hours for large proteins.) If you use the default time of 3 hour max for the work units, and it took 1 hour to finish your first model, then it'll switch from a low value like 1.xxx% to 33%. It was worse.. prior to a troubleshooting step, we didn't have the steps and changes happened even less frequently. :)

As for saved points - as soon as it reaches certain steps in the code, it's supposed to check and see if it's been more than 20 mins since the last saved point, and if so, save. Perhaps you could ask for the client to have a reading for when the last checkpoint was made, so you'll know how safe it is to shut the system down. (i.e. 35 mins since last checkpoint.)

If we make suggestions on ways of getting around problems, the programmers will often incorporate them, or come up with an even better solution.
ID: 19811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1830
Credit: 119,199,800
RAC: 3,514
Message 19821 - Posted: 6 Jul 2006, 9:46:01 UTC

I think that would be a good thing to include so a utility could perform a 'shutdown after next save point' - I remember RM included this into UDMon back in the days of UD-Think!
ID: 19821 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19828 - Posted: 6 Jul 2006, 14:43:40 UTC

For the record, there are other projects that are worse than Rosetta in some of these respects. Some have multi-hour work units which do no checkpointing at all. Others have multi-month WUs, and so 1000+ hours of runtime, and their estimates count up much of the time as well.

But I think we all agree, visible forward motion is the expectation, and checkpointing frequently enough that we don't lose work, but not so frequently that we lose work (by spending all the time taking checkpoints).

The project recently made some great enhancements to the checkpoints as Benny described, but the huge proteins from CASP basically stress even the new checkpointing scheme to the point that it is still possible to lose an hour of crunching.

It's kind of like following a complex set of recipies in the kitchen. You've got 6 dishes in progress at the same time. You've got every burner on the stove going, and there's laundry in the washer, a turkey in the oven, and the phone is ringing... and someone says "let's go get some fresh air" (i.e. "please take a checkpoint NOW")... you just can't do it. And if you tried, it would cost you more time in the long run.

On the other hand... if the power goes out... you find a way to recover (from the last checkpoint). If I cut power to your kitchen just when the bread needs to start baking... or else the yeast will spoil it; your "recovery" of the bread is to start making more dough (i.e. you lost work). If you adopt a system of freezing dough, you can "recover" quicker. This is a good point in the process to take a "checkpoint", because you can recover simply by warming it up again, so recovery time isn't bad, and much the work is preserved.

Rosetta is the same way (actually any computer processing), there are some places where checkpointing is feasible and easy (like the dough stage) and other places where it's not... like after the dough rises.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,805,205
RAC: 909
Message 19875 - Posted: 7 Jul 2006, 9:22:21 UTC

There was some talk about the new scheduler in this thread.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 19875 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : "Progress Percentage" Counts Up in Painfullly Long Segments...



©2024 University of Washington
https://www.bakerlab.org