Lost 2h of work

Message boards : Number crunching : Lost 2h of work

To post messages, you must log in.

AuthorMessage
Profile eL_nino

Send message
Joined: 20 Jan 06
Posts: 10
Credit: 43,257
RAC: 0
Message 17696 - Posted: 5 Jun 2006, 19:43:13 UTC

Ok, WTF is this- I have Target CPU run time = 16h in last few days, and now my WU came to 4h of crunching and then I restarted my computer (because of some new software) and when Windows started again and I started Boinc this WU was on 2h progress! WTF is that?! And that is not 1st time something like that happened! Now I will put my target WU time on 1h so this shi* wont happen again, in last 5-6 days I lost arround 10h of work because of this. This really sucks when you do same work twice... :(
ID: 17696 · Rating: -2 · rate: Rate + / Rate - Report as offensive    Reply Quote
Maxxou59

Send message
Joined: 5 May 06
Posts: 10
Credit: 63,715
RAC: 0
Message 17701 - Posted: 5 Jun 2006, 20:11:50 UTC

Because Rosetta software do a backup between each step, if a step is not finsih when you restart, the step will be re calculate before the first step ...
Maxxou59-Lille-France
Student at University of Chemistry
ID: 17701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17705 - Posted: 5 Jun 2006, 20:39:08 UTC

Rosetta does checkpoint often. On my machine usually between 5 and 20 minutes but that might take longer on slow computers and on certain WUs. If you restart your machine some work is inevitably lost but that is unpreventable. Your computers are hidden so we can\'t check them what the problem is. One idea is to time your restarts when a WU is finished but that might be inconvenient.
ID: 17705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ronald

Send message
Joined: 6 Jun 06
Posts: 1
Credit: 448
RAC: 0
Message 17720 - Posted: 6 Jun 2006, 1:08:35 UTC

qustion? i`m running another program called united devices before i turned my computer off i always close it since with that program too i have lost time with it but since i close it now have not lost any computer time with it
i`m new to your program since i turn my system off at night by closing down your program will that save it from messing up
just wondering
thank you
ronald schwarz
ID: 17720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1740
Credit: 3,655,614
RAC: 0
Message 17735 - Posted: 6 Jun 2006, 3:46:43 UTC

In short, it\'s not messing up. It\'s just at a point in exploration of the protein that can\'t easily be preserved. So, when you restart your computer, you pick up at the last checkpoint where it was possible to save your place.

Think of it this way, if you drop breadcrumbs as you explore a forest, and you turn off your computer. You enter the forest again tomorrow, you follow all the breadcrumbs, and you end up at the last one... you\'ve lost everything after that and have to explore it again.

Bottom line, it gets more work done if it is not interrupted by powering off your computer. But, reality as it is (I\'ve been powering mine off during the day as we are now in to air conditioning season), it is able to run just fine as long as you have enough runtime to drop a breadcrumb before you turn off your machine.
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
ID: 17735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 17762 - Posted: 6 Jun 2006, 7:50:09 UTC - in response to Message 17705.  
Last modified: 6 Jun 2006, 7:54:50 UTC

Rosetta does checkpoint often. On my machine usually between 5 and 20 minutes but that might take longer on slow computers and on certain WUs.

E.g some ofthe t296__CASP_ABINITIO_SAVE_ALL_OUT workunits here have their first checkpoint after about 90-120 min. (2200+ AMD, 2400 INTEL)

Norbert

ID: 17762 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eL_nino

Send message
Joined: 20 Jan 06
Posts: 10
Credit: 43,257
RAC: 0
Message 17769 - Posted: 6 Jun 2006, 10:00:08 UTC
Last modified: 6 Jun 2006, 10:00:26 UTC

I have put now my \"target WU time\" on 1h... So no problem now, I hate to loose 1-2h of work everytime when I turn of or restart my computer, it is not very nice...
ID: 17769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 17788 - Posted: 6 Jun 2006, 14:16:01 UTC - in response to Message 17769.  

I have put now my \"target WU time\" on 1h... So no problem now, I hate to loose 1-2h of work everytime when I turn of or restart my computer, it is not very nice...


Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

In fact, it will increase the average ammount of work lost. Here\'s why. Assume that half the WUs crunch models very fast, and half take 4 hours per model. If the target WU time is long then the computer will be spending half its time on each type. But with a 1 hour target time, the fast WUs will take 1 hour and the slow ones will take 4 hours (as they must complete at least 1 model no matter how short the target time). Thus, the computer will now be spending 1/5 of its time on fast, frequently checkpointing WUs and 4/5 of its time on slow WUs.
ID: 17788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1740
Credit: 3,655,614
RAC: 0
Message 17808 - Posted: 6 Jun 2006, 17:04:03 UTC - in response to Message 17788.  

Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

AMD is correct. Your runtime preference doesn\'t effect when checkpoints can occur. Sorry, it just doesn\'t work that way. The FAQ on WU runtime preference explains that you must crunch a complete model, regardless of whether or not the time to do that exceeds your 1hr preference.
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
ID: 17808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eL_nino

Send message
Joined: 20 Jan 06
Posts: 10
Credit: 43,257
RAC: 0
Message 17825 - Posted: 6 Jun 2006, 18:23:07 UTC - in response to Message 17808.  

Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

AMD is correct. Your runtime preference doesn\'t effect when checkpoints can occur. Sorry, it just doesn\'t work that way. The FAQ on WU runtime preference explains that you must crunch a complete model, regardless of whether or not the time to do that exceeds your 1hr preference.


YES, i know all that, but when WU is on 1h (so far I crunched arround 20 wu-s like that and they all were from 50 minutes to 70 minutes)- even when I turn of my computer off or restart maximum I can loose is 10-20 minutes and not 2-3h like happened few times.
ID: 17825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 17832 - Posted: 6 Jun 2006, 18:55:20 UTC - in response to Message 17825.  
Last modified: 6 Jun 2006, 18:56:24 UTC

Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

AMD is correct. Your runtime preference doesn\'t effect when checkpoints can occur. Sorry, it just doesn\'t work that way. The FAQ on WU runtime preference explains that you must crunch a complete model, regardless of whether or not the time to do that exceeds your 1hr preference.


YES, i know all that, but when WU is on 1h (so far I crunched arround 20 wu-s like that and they all were from 50 minutes to 70 minutes)- even when I turn of my computer off or restart maximum I can loose is 10-20 minutes and not 2-3h like happened few times.


edit: I actually had nothing to say, sorry.

ID: 17832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ricardo

Send message
Joined: 9 Dec 05
Posts: 26
Credit: 24,039
RAC: 0
Message 17840 - Posted: 6 Jun 2006, 19:58:36 UTC - in response to Message 17825.  

Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

AMD is correct. Your runtime preference doesn\'t effect when checkpoints can occur. Sorry, it just doesn\'t work that way. The FAQ on WU runtime preference explains that you must crunch a complete model, regardless of whether or not the time to do that exceeds your 1hr preference.


YES, i know all that, but when WU is on 1h (so far I crunched arround 20 wu-s like that and they all were from 50 minutes to 70 minutes)- even when I turn of my computer off or restart maximum I can loose is 10-20 minutes and not 2-3h like happened few times.


Hi, I am not sure but can this matter not be solved leaving application in memory (swap file) while preempted?

Regards,
Ricardo (Ex Seti cruncher)

ID: 17840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ricardo

Send message
Joined: 9 Dec 05
Posts: 26
Credit: 24,039
RAC: 0
Message 17845 - Posted: 6 Jun 2006, 20:34:25 UTC - in response to Message 17840.  

Reducing the target WU time will not reduce the average amount of work lost when the computer is turned off.

AMD is correct. Your runtime preference doesn\'t effect when checkpoints can occur. Sorry, it just doesn\'t work that way. The FAQ on WU runtime preference explains that you must crunch a complete model, regardless of whether or not the time to do that exceeds your 1hr preference.


YES, i know all that, but when WU is on 1h (so far I crunched arround 20 wu-s like that and they all were from 50 minutes to 70 minutes)- even when I turn of my computer off or restart maximum I can loose is 10-20 minutes and not 2-3h like happened few times.


Hi, I am not sure but can this matter not be solved leaving application in memory (swap file) while preempted?

Regards,
Ricardo (Ex Seti cruncher)


Forget my earlier comments because swap file is deleted when the computer is turned out.
ID: 17845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1740
Credit: 3,655,614
RAC: 0
Message 17954 - Posted: 7 Jun 2006, 16:34:07 UTC - in response to Message 17825.  

...when I turn of my computer off or restart maximum I can loose is 10-20 minutes and not 2-3h like happened few times.


Yes, R@H has recently added additional checkpointing. ...and just in time for these large WUs they are getting from CASP. The objective is the when R@H is ended (either due to turning off the PC, or removing the app from memory when you switch to crunch another project) that you would lose on average, only 10 or 20 minutes.

If you would, keep some notes. If you find another case where you lose 2 hours, please note the following:
Rosetta application release (shown in work tab),
WU name,
% complete shown at the time you ended,
and which step you were on when it ended,

then report those details in the appropriate thread about problems with a given release of Rosetta. Losing 2 hours of work should be considered a problem. They may not have an immediate solution, but if a specific WU has such a problem, perhaps they can learn more about why it isn\'t checkpointing more and later find ways to do more checkpointing on such WUs.
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
ID: 17954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Lost 2h of work



©2019 University of Washington
http://www.bakerlab.org