No checkpoint in more than 1 hour - Largescale_large_fullatom...

Message boards : Number crunching : No checkpoint in more than 1 hour - Largescale_large_fullatom...

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Osku87

Send message
Joined: 1 Nov 05
Posts: 17
Credit: 280,268
RAC: 0
Message 14231 - Posted: 21 Apr 2006, 6:03:33 UTC

And it would be way better if user could decide the timelimit.
ID: 14231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 14241 - Posted: 21 Apr 2006, 7:31:52 UTC - in response to Message 14231.  
Last modified: 21 Apr 2006, 7:53:23 UTC

And it would be way better if user could decide the timelimit.


Actually there is already such a parameter in the "General settings":

"Write to disk at most every" = xx seconds

Default is 60 (one minute). I set it to 600 (10 minutes) since one minute seems very often.

So such a parameter is already present in the BOINC-Framework. Most projects ignore it anyway but one could use it. As for checkpoints <100 KB it is really not a problem to checkpoint often (not every miunute but say 5-10 minutes).
ID: 14241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 14276 - Posted: 21 Apr 2006, 16:35:27 UTC

Yes, better would be if user could define it. I just thought 20min would help improve the throughputs for the folks with the 1hr switch between apps setting. It might also help avoid the 1% stuck, and 5 strikes and you're out conditions, because it will make checkpoints dynamically as appropriate for the box and the WU it's running. It should also help folks that only have their PC on for brief periods of time. I'm glad to hear Bin feels he can implement the idea to dynamically determine if checkpoint is desirable at the same time.

This should make the WU run experience more consistent, regardless of the CPU speed and length of the WU's protein. And I think that will help avoid confusion, and any perception of instability.

If you're curious, there's also a very similar thread on the BOINC boards:
Preempt only at checkpoints. THIS would be the ultimate. Now instead of "only" losing an average of 10min per preempt... you'd lose an average of... well... ZERO! An 18% improvement over the improved R@H checkpointing!
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 14276 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bin Qian

Send message
Joined: 13 Jul 05
Posts: 33
Credit: 36,897
RAC: 0
Message 14300 - Posted: 21 Apr 2006, 20:40:00 UTC - in response to Message 14276.  

The user controllable checkpointing would be ideal. Unfortunately the current checkpointing machinery can only be done in certain stages of the modeling process. In a nutshell, the process has to reach a stage in the modeling process where the previous searching history (which includes a huge amount of data if we were to record all of it) can be discarded, so we can live with only checkpointing a minimum amount of data for the future searches. We can not checkpoint at any point of the modeling process yet.

I've implemented Feet1st's idea below: when the WU reaches a stage where checkpointing is possible, it will see how long it has been since the last checkpointing. If it's over 20 minutes, then checkpoints.
ID: 14300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
casio7131

Send message
Joined: 10 Oct 05
Posts: 35
Credit: 149,748
RAC: 0
Message 14439 - Posted: 23 Apr 2006, 3:02:20 UTC

this checkpointing news sound great.
ID: 14439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : No checkpoint in more than 1 hour - Largescale_large_fullatom...



©2024 University of Washington
https://www.bakerlab.org