Some WU take a long time

Message boards : Number crunching : Some WU take a long time

To post messages, you must log in.

AuthorMessage
MichaelHe

Send message
Joined: 2 Oct 09
Posts: 4
Credit: 380,105
RAC: 0
Message 65785 - Posted: 21 Apr 2010, 0:25:37 UTC

Most of my WU take around 3 hours to finish but sometimes it takes 6-10. I've currently got one that's at the 6:30 mark. It appears to be "stuck", updating only once every half minute and only in very small quantities. When I check the credit granted for such WU, it is usually very low. Is this normal?
ID: 65785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MichaelHe

Send message
Joined: 2 Oct 09
Posts: 4
Credit: 380,105
RAC: 0
Message 65786 - Posted: 21 Apr 2010, 2:45:59 UTC

Also, is it just me or is granted credit consistently lower than claimed credit? For me, 90% of the time my granted credit is lower, and when it's higher it doesn't make up for the times when it's lower. You can check my results at https://boinc.bakerlab.org/rosetta/results.php?userid=352530
ID: 65786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 65906 - Posted: 30 Apr 2010, 17:25:32 UTC

This task 334979307 ( rs_stg0_lrlx_t447__boincid_SAVE_ALL_OUT_19714_1598_0 ) eventually validated, producing a single decoy, but the graphics looked strange. I have a screenshot (one protein looks crunched up in a ball) but can't figure out how to upload it.

ID: 65906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 65911 - Posted: 30 Apr 2010, 20:33:02 UTC - in response to Message 65786.  

Also, is it just me or is granted credit consistently lower than claimed credit? For me, 90% of the time my granted credit is lower, and when it's higher it doesn't make up for the times when it's lower. You can check my results at https://boinc.bakerlab.org/rosetta/results.php?userid=352530


I have basically asked the same question somewhere in this thread: Credit always low

There is some detailed information on how credits are granted near the end of the thread.

I have the same 'issue' with my i7. My other computer has a Q9650. This computer usually gets granted what it claims. I wonder if hyper threading is causing the low granted credits. But I dare tuning it off, since it for sure is a benefit at the end of the day.

Jochen
ID: 65911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SFCC

Send message
Joined: 3 Sep 09
Posts: 10
Credit: 227,659
RAC: 0
Message 65913 - Posted: 30 Apr 2010, 23:29:02 UTC

I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3.
ID: 65913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 65918 - Posted: 1 May 2010, 15:52:39 UTC - in response to Message 65913.  

I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3.


The workaround for tasks that are 'hanging' on Window (apparently stuck and getting 0% CPU time in the Task Manager) is to Quit and Restart BOINC.
ID: 65918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,748,962
RAC: 59,103
Message 65928 - Posted: 2 May 2010, 11:08:53 UTC - in response to Message 65918.  

I just aborted a WU that had been running 40+ hours and showed 92 hours remaining! CPU was running at only 13% capacity, so it appears that the WU was just sitting in some "do nothing" loop. I aborted another one that had run 20+ hours with 30+ hours shown as remaining. Normally WU take 3-4 hours on this machine. I had encountered this problem occasionally in the past, but now it happens with most WUs. I was told sometime ago that it is an "occasional" problem with Windoz machine, but now it appears to be quite common on my machine. I have suspended the project until someone can tell me how to fix the problem if it on my end or get it fixed if it is on the project's end of the pipe. I'm running BOINC version 6.10.18 on a 2.0 GHz dual core AMD machine running Windows XP Media Center Edition with SP3.


The workaround for tasks that are 'hanging' on Window (apparently stuck and getting 0% CPU time in the Task Manager) is to Quit and Restart BOINC.


Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling!
ID: 65928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 65929 - Posted: 2 May 2010, 12:24:29 UTC - in response to Message 65928.  

Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling!

What would be the state displayed in this case? Still 'active'?

Jochen
ID: 65929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SFCC

Send message
Joined: 3 Sep 09
Posts: 10
Credit: 227,659
RAC: 0
Message 65932 - Posted: 2 May 2010, 21:47:10 UTC - in response to Message 65929.  

Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling!


What would be the state displayed in this case? Still 'active'?

Jochen


BOINC is configured to use 100% of idle computer time. The WU is still running with indicated CPU time increasing.

Another post mentions stopping and restarting BOINC to 'cure' the problem - that works sometimes, sometimes not. The proplem with that is that the machine is located 'off site' in our computer club computer room and I administer it remotely from home. Due to time constraints, I don't log into it on a dayly basis so when it gets into this strange mode it can just sit there spinning its wheels and the display goes blank. We are trying to interest our club members to participate in BOINC projects and when they see one of our 'display' machines hung-up, that is NOT good press... So, I have suspended Rosetta and am running other projects.
ID: 65932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,748,962
RAC: 59,103
Message 65940 - Posted: 3 May 2010, 12:07:00 UTC - in response to Message 65929.  

Newer versions on BOINC Manager suspends processing when CPU usage is 25% or greater by default I believe - make sure the jobs aren't suspended before canceling!

What would be the state displayed in this case? Still 'active'?

Jochen

No- they're displayed as 'Suspended- CPU usage too high'.
ID: 65940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 65942 - Posted: 3 May 2010, 13:00:10 UTC - in response to Message 65940.  

No- they're displayed as 'Suspended- CPU usage too high'.

Thanks.
I have not seen a WU with this state. But I found a couple of log entries stating 'CPU usage too high, suspending computation' and a second or two later 'Resuming computation'.

But these long-running models do have other side effects. I had two WUs yesterday that ran approx. 7 hours (I am runmomg 3 hours WUs). When the first one was done, all other running WUs were 'suspended' and other WUs (all with a later expiration date) were started with the state 'Active, high priority'. I would have guessed, the ones to do first were the ones with the nearest expiration date. It looks like the BOUNC client was trying to find WUs that could be finished in time with the new duration for a single WU given. Bad thing about this is, that I was running out of memory, because it kept the suspended ones in memory (yes I know I could change this, but this would mean to lose some computation time to revert back to latest safe point).

I actually can not see the reason for this behaviour. When there is one long running model, the estimated duration is set to this value for all other WUs instantly, but the estimated duration decreases only slowly. I do not like this.

Jochen

ID: 65942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 65944 - Posted: 3 May 2010, 17:52:09 UTC - in response to Message 65943.  

In that version a new option is used.

Yes, I found it and set it to 'no restriction'.

Thanks

Jochen

ID: 65944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] M@cro

Send message
Joined: 23 Jul 09
Posts: 3
Credit: 84,790
RAC: 0
Message 66697 - Posted: 26 Jun 2010, 10:29:25 UTC

Hi folks! Is anybody crunching this Rosetta WU?

rs_stg0_lrlx_t459_casp8_SAVE_ALL_OUT_20813_1841_0

Because it starts as an 18hs WU (as I set in prefs) then time-to-complete goes up slowly to 36h, so total time is 53hs..
The other WUs I'm crunching are 18h-long!

M@cro - BOINC.Italy
ID: 66697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 66703 - Posted: 27 Jun 2010, 13:51:57 UTC

M@cro, please check that you are looking at the actual CPU time used by the task, and not the elapsed time. You can do this by going to the advanced view, to the tasks tab, highlight the task you mentioned, and then click the properties button over on the left.
Rosetta Moderator: Mod.Sense
ID: 66703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Some WU take a long time



©2024 University of Washington
https://www.bakerlab.org