No Checkpoints for recent jobs.

Questions and Answers : Windows : No Checkpoints for recent jobs.

To post messages, you must log in.

AuthorMessage
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 75654 - Posted: 22 May 2013, 21:16:57 UTC

I've noticed the last couple weeks that there have been several types of jobs I haven't seen before (some beginning with the "hyb" or "hybred," "cyto," etc.) These jobs are not setting checkpoints, even after crunching up to 11 hours or so (with checkpoint limited to no more than every 60 sec. in computer pref.) The jobs starting "rb_5_17" and other "dates" continue to have checkpoints as usual.

The problem, as noted in another recent thread, is that if I must shut down my system or reboot (such as for doing Windows updates, updating applications, etc.), or if I must close BIONC, I lose all the work in these "new" type jobs without checkpoints.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.
ID: 75654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
vakobo

Send message
Joined: 3 Aug 08
Posts: 18
Credit: 13,636,264
RAC: 2,319
Message 75666 - Posted: 26 May 2013, 9:01:36 UTC

I have now rb_05_14_38371_73012__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_81049_953_0 task without checkpoints. It is now at 71% but i need to shutdown my PC and it will start next time from 0% again.
ID: 75666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 75673 - Posted: 26 May 2013, 20:07:54 UTC - in response to Message 75666.  

I have now rb_05_14_38371_73012__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_81049_953_0 task without checkpoints. It is now at 71% but i need to shutdown my PC and it will start next time from 0% again.


The only "remedy" I've found involves my old Windows 98 rig crunching SETI 1 job at a time. I've just backed up the BOINC folder, including the slot with the particular job in question. However, if you're crunching a large number of jobs, this would be a huge backup file.

Hopefully whomever wrote this particular type of job will realize there's a bug and will work on remedying this situation, as obviously not all instances are without checkpoints. Good Luck!
ID: 75673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 75677 - Posted: 27 May 2013, 19:04:22 UTC

When a task has not checkpointed, then the slots folder hasn't got the data required to help preserve the uncheckpointed work. So you still lose the work just as you do when you turn off the computer without doing the backup described.

Some of the types of tasks run recently have long-running models. The application will always checkpoint at the end of a model. Some types of tasks checkpoint more frequently. When new protocols are being developed, it is unclear if the protocol will be used extensively going forward or not. It is also unclear if further refinements can be made to eliminate the long-running models or reduce their frequency.

There isn't much you can change on your machine to improve the situation. Each time a task starts, a check is made to see if this has previously been a starting point. It would be normal to see this once and a while as people reboot to install updates to their machines etc. But if it occurs 5 times on the same task, then the task is cut-off and marked as completed. This ensures that such tasks, which are not progressing on your machine, can report back the work they have completed, and free of a space for another task, which may have properties that better match your machine and it's uptime etc.
Rosetta Moderator: Mod.Sense
ID: 75677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Windows : No Checkpoints for recent jobs.



©2024 University of Washington
https://www.bakerlab.org