Report long-running models here

Message boards : Number crunching : Report long-running models here

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

AuthorMessage
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 66235 - Posted: 19 May 2010, 21:45:33 UTC

rb_05_01_108_246_rs_stg0_lrlxcst_t000__boincid_SAVE_ALL_OUT_20219_102_1

Windows 7 64

BOINC version 6.10.56

Rosetta Mini version 2.14

Link to result

7 hours for 1 decoy
ID: 66235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 66269 - Posted: 21 May 2010, 22:48:24 UTC
Last modified: 21 May 2010, 22:51:58 UTC

This one over ran its `one day` time a bit :)

rhoA15May2010_1lb1_2eyi_ProteinInterfaceDesign_15May2010_20686_20_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=339740643

cpu time 101389.5

DONE :: 2 starting structures 101387 cpu seconds
This process generated 4987 decoys from 4987 attempts
ID: 66269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66324 - Posted: 25 May 2010, 22:15:22 UTC

I have had problems with several of these on all of my systems. The CPU time will be 7+ hours with the last check point being as long as 5 hours earlier.

Matt

int2_centerfirst2b_1fAc_1i76_ProteinInterfaceDesign_23May2010_21231_62_0


int2_centerfirst2b_1fAc_1k7j_ProteinInterfaceDesign_23May2010_21231_129_0
ID: 66324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
chris

Send message
Joined: 18 Oct 06
Posts: 6
Credit: 12,215,357
RAC: 0
Message 66337 - Posted: 27 May 2010, 5:20:28 UTC

ID: 66337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66353 - Posted: 29 May 2010, 13:13:35 UTC

int2_centerfirst2b_1fAc_2a9o_ProteinInterfaceDesign_23May2010_21231_280



Found this one at 6 1/2 hours (default is 3 on this system) with CPU time at 28 min.
I suspended it and then resumed and let it run for another 15 min but my CPU time only went up another 2 min so I aborted the WU.


ID: 66353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66373 - Posted: 30 May 2010, 17:01:51 UTC

gunn_fragments_SAVE_ALL_OUT_-1rkiA__20675_701_0

This one had an elapsed time of 10.5 hours with CPU time stuck at 2 hours.

Suspend/resume allowed it to finish normally in 21101.7 CPU seconds, but elapsed time was almost 15 hours.
Default run time on this system is 6 hours.

ID: 66373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66376 - Posted: 30 May 2010, 18:05:15 UTC

Anyone else ever notice that these "long running" tasks all seem to have a very stale "CPU time at last checkpoint" ??
ID: 66376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,509,582
RAC: 14,752
Message 66387 - Posted: 31 May 2010, 20:16:29 UTC

I can confirm it. 1715 decoys after 6hours, no further checkpoints after 11 hours of processing is an example I saw tday.
ID: 66387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,815,019
RAC: 764
Message 66410 - Posted: 1 Jun 2010, 20:34:10 UTC

int_simpleTwo_1f0s_1z9l_ProteinInterfaceDesign_21May2010_21289_95_0

When I checked after just under ten hours of cpu time it had been about six hours since the last checkpoint. Opening the graphics window I could see it was working on model 941 and the step number continue to increase as I watched. Unfortunately, when I checked back later the watchdog had killed the workunit and it was reported back with only 940 models completed. That last one was a doozy. I hope it was able to provide some interesting leads.


Snags
ID: 66410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
billy ewell 1931

Send message
Joined: 30 Mar 07
Posts: 14
Credit: 6,558,102
RAC: 1,951
Message 66568 - Posted: 13 Jun 2010, 17:50:37 UTC

313189623: This Work Unit was aborted at the 10.5 hour elapsed point, with about 44+% progress and the time to completion of about 9.5 hours and increasing. Windows Vista 32bit/Intel Q9400 @ 2.66 Ghz. Acct. # 160868. Computers uncovered.
ID: 66568 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moritz Winter

Send message
Joined: 18 Nov 09
Posts: 1
Credit: 110,617
RAC: 0
Message 66571 - Posted: 14 Jun 2010, 8:36:01 UTC

ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_189_0
and
ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_380_0

took around 27 h for < 2 % completion before i stopped them
ID: 66571 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 66683 - Posted: 24 Jun 2010, 20:17:38 UTC

Are you sure that the WU is really still running? On my Windows machine, some WUs simply stop processing, even though BOINC thinks they are still running. I know they have stopped because the Windows Task Manager shows that the CPU is no longer working hard, and my CPU fan gets quiet. I wouldn't trust the BOINC Manager to accurately inform me of task status if I were you. I've gotten to the point where I leave Task Manager running all the time so that I can see immediately when a Rosetta task stops processing.

deesy
ID: 66683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 66792 - Posted: 6 Jul 2010, 20:43:43 UTC

It looks likes all the ProteinInterfaceDesign-tasks are long-running models...
ID: 66792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,509,582
RAC: 14,752
Message 66793 - Posted: 7 Jul 2010, 8:29:53 UTC - in response to Message 66792.  

It looks likes all the ProteinInterfaceDesign-tasks are long-running models...

Not quite. The ones that start simIF are long-running but the ones that start celldiv are fine.
ID: 66793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 66794 - Posted: 7 Jul 2010, 10:37:56 UTC - in response to Message 66793.  


Not quite. The ones that start simIF are long-running but the ones that start celldiv are fine.

Ok, you're right, I had some of the celldiv as well, but they ran at night, so I didn't notice them.

I don't know what to do with these simIF-PID-tasks. The problem is, that those files get 'stuck' after 5h 50min. This is where the last saved check-point is done. But with the current temperatures I have to turn off my two computers at home during the day (since I can't leave the windows open, when I'm at work). This would result in a major loss... I actually consider increasing my cache and abort those long-running tasks before they start. But I don't really like this idea.

cu Joe

ID: 66794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Borgie

Send message
Joined: 26 Jul 10
Posts: 2
Credit: 58,918
RAC: 0
Message 67033 - Posted: 29 Jul 2010, 2:30:01 UTC
Last modified: 29 Jul 2010, 2:57:35 UTC

"fc_A_noSmallMvs_fc6x_3cbz_ProteinInterfaceDesign_20Jun2010_21458_256_0"

task 355045827 work unit 324321723 Over Success Done
25,601.52 seconds 147.06 claimed 0.68 granted
All that time (7+ hrs.) and not even ONE whole credit?
My other tasks took less than 3 hr. and granted 40 or more credits.

XP Home SP3 AMD Athlon II X2 250
3.01 GHz 1.75 GB RAM
ID: 67033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67208 - Posted: 14 Aug 2010, 5:48:54 UTC
Last modified: 14 Aug 2010, 5:49:40 UTC

This one was longish!

8hrs for 1 model, On my x6 1055T 4hr R/T.

rs_stg0_lrlx_t328__run1_SAVE_ALL_OUT_19365_2958_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=327956645


BOINC:: CPU time: 28815.8s, 14400s + 14400s[2010- 8-14 14:58:31:] :: BOINC
InternalDecoyCount: 0
======================================================
DONE :: 1 starting structures 28815.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
ID: 67208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67249 - Posted: 17 Aug 2010, 19:32:55 UTC

They're back ...

How do you make a hex-core process like a dual core?

intSpin1_xxx_ProteinInterfaceDesign

Sample tasks:

359697007
359748173
359684781
359685309

Plus a number of others currently on the long road to go see the watchdog.


ID: 67249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68184 - Posted: 24 Oct 2010, 5:01:46 UTC

This one was near enough to 7hrs on my 4hr runtime pref, on my x6 1055t.


PCS_calmodulin_v1.frag_23-147_SAVE_ALL_OUT_22378_17_0


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=341596781

======================================================
DONE :: 2 starting structures 24572.3 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

p.s/ Not good credits either!


ID: 68184 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68202 - Posted: 26 Oct 2010, 21:02:37 UTC

I seem to be getting some "sub-optimal" results with tasks with a name in form of:

PCS_xxxx_atensor.frag....

For example - Task ID 373876225 - running on a dedicated Phenom II core clocked at about 3.0 ghz with adequate memory netted only 39 credits for a bit over 10 hours run time - CPU time, not wall clock time. Which is "watchdog territory" as my run time is set to a modest 6 hours.

I have in the past reviewed how credits are apportioned and even think I understand it and the way things are "averaged", however if there is a "non-cryogenic" CPU out there who can process Rosetta at a rate 8 to 10 faster than my Phenom II does (which is what it would take to drop my calculated average that much) I want to look into buying a few ...

ID: 68202 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

Message boards : Number crunching : Report long-running models here



©2024 University of Washington
https://www.bakerlab.org