Should I abort work unit?

Message boards : Number crunching : Should I abort work unit?

To post messages, you must log in.

AuthorMessage
Profile Ed and Harriet Griffith
Avatar

Send message
Joined: 17 Sep 05
Posts: 39
Credit: 1,761,354
RAC: 1,192
Message 5866 - Posted: 11 Dec 2005, 15:09:23 UTC

Normally Rosetta work units last between one and six hours, but I have one which is over 11 hours, still working, and still only shows 1% done. It does not look right, but I hate to abort a unit if there is science there. Any advice? Should I continue to 24 hours before aborting? I run 1.8 MHz computer with 496 Mb RAM.

ID: 5866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,699,311
RAC: 1,274
Message 5867 - Posted: 11 Dec 2005, 15:21:06 UTC
Last modified: 11 Dec 2005, 15:23:32 UTC

Have you tried stopping your BOINC core client then restarting it again? That often gets a stuck wu Rosetta going again. Also, checking the "Leave in memory" box on your application setup helps.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 5867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 4,053,884
RAC: 6,802
Message 5869 - Posted: 11 Dec 2005, 16:18:39 UTC
Last modified: 11 Dec 2005, 16:19:03 UTC

If I can help it I don't let any Rosetta WU run for more than 70-80 Min's @ 1%, if it gets that far I either shut BOINC Down and restart it or I just Abort it.

It's a total waste of CPU time to let these WU's run Hour after Hour & never get past the 1% Mark ... IMO
ID: 5869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Osku87

Send message
Joined: 1 Nov 05
Posts: 17
Credit: 280,268
RAC: 0
Message 5884 - Posted: 11 Dec 2005, 20:11:51 UTC

I had the same problem and rebooting the Boinc-client helped. I had ran it about 5 hours with Sempron 2800+. After rebooting the client CPU time resetted.

Now it has ran properly being in 70% after 1 hour and 9 minutes.
ID: 5884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Beatminister
Avatar

Send message
Joined: 23 Oct 05
Posts: 7
Credit: 206,304
RAC: 0
Message 5886 - Posted: 11 Dec 2005, 20:43:12 UTC

ID: 5886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alan Bridgewater

Send message
Joined: 24 Nov 05
Posts: 2
Credit: 10,205
RAC: 0
Message 5899 - Posted: 11 Dec 2005, 22:28:20 UTC

I've noticed twice now when I'm fooling around with settings like Suspending/Unsuspending tasks, Rosetta seems to have a problem with it. When I unsuspend the Rosetta WU, I see the % stay the same, and the "To Completion" time increases by one second every 5 seconds. After a few hours of that, I just end up killing the work unit, which I don't like to do, but I'm not going to have my machine wasting its time and never completing the WU.

Furthermore, after I deleted the work unit and got another one, I believe it went into the same state. I had to delete the files from Rosetta and download another work item. And then it worked perfectly. Until the next time I was fooling around with suspending/unsuspending work.
ID: 5899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,593,406
RAC: 2,787
Message 5901 - Posted: 11 Dec 2005, 22:56:11 UTC - in response to Message 5899.  

Suspending/Unsuspending tasks


If you have "leave applications in memory when preempted" set to "no", I'd expect what you're describing (or worse). If that setting is "yes", then this is something the developers need to take a look at.

ID: 5901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alan Bridgewater

Send message
Joined: 24 Nov 05
Posts: 2
Credit: 10,205
RAC: 0
Message 5906 - Posted: 12 Dec 2005, 0:13:28 UTC

Interesting. I checked my settings, and "No" was selected. My client was just in that state not long ago, but I decided to let it go for a few more hours and it somehow corrected itself and completed the WU. Anyway, I've changed the setting to "Yes", and will change that setting for all the other clients as well.

Hopefully the problem won't happen again. Thanks for the tip.
ID: 5906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Should I abort work unit?



©2024 University of Washington
https://www.bakerlab.org