Message boards : Number crunching : No checkpoint in more than 1 hour - Largescale_large_fullatom...
Previous · 1 · 2 · 3
Author | Message |
---|---|
Osku87 Send message Joined: 1 Nov 05 Posts: 17 Credit: 280,268 RAC: 0 |
And it would be way better if user could decide the timelimit. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
And it would be way better if user could decide the timelimit. Actually there is already such a parameter in the "General settings": "Write to disk at most every" = xx seconds Default is 60 (one minute). I set it to 600 (10 minutes) since one minute seems very often. So such a parameter is already present in the BOINC-Framework. Most projects ignore it anyway but one could use it. As for checkpoints <100 KB it is really not a problem to checkpoint often (not every miunute but say 5-10 minutes). |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Yes, better would be if user could define it. I just thought 20min would help improve the throughputs for the folks with the 1hr switch between apps setting. It might also help avoid the 1% stuck, and 5 strikes and you're out conditions, because it will make checkpoints dynamically as appropriate for the box and the WU it's running. It should also help folks that only have their PC on for brief periods of time. I'm glad to hear Bin feels he can implement the idea to dynamically determine if checkpoint is desirable at the same time. This should make the WU run experience more consistent, regardless of the CPU speed and length of the WU's protein. And I think that will help avoid confusion, and any perception of instability. If you're curious, there's also a very similar thread on the BOINC boards: Preempt only at checkpoints. THIS would be the ultimate. Now instead of "only" losing an average of 10min per preempt... you'd lose an average of... well... ZERO! An 18% improvement over the improved R@H checkpointing! Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Bin Qian Send message Joined: 13 Jul 05 Posts: 33 Credit: 36,897 RAC: 0 |
The user controllable checkpointing would be ideal. Unfortunately the current checkpointing machinery can only be done in certain stages of the modeling process. In a nutshell, the process has to reach a stage in the modeling process where the previous searching history (which includes a huge amount of data if we were to record all of it) can be discarded, so we can live with only checkpointing a minimum amount of data for the future searches. We can not checkpoint at any point of the modeling process yet. I've implemented Feet1st's idea below: when the WU reaches a stage where checkpointing is possible, it will see how long it has been since the last checkpointing. If it's over 20 minutes, then checkpoints. |
casio7131 Send message Joined: 10 Oct 05 Posts: 35 Credit: 149,748 RAC: 0 |
this checkpointing news sound great. |
Message boards :
Number crunching :
No checkpoint in more than 1 hour - Largescale_large_fullatom...
©2024 University of Washington
https://www.bakerlab.org