Message boards : Number crunching : Report long-running models here
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Divide Overflow Send message Joined: 17 Sep 05 Posts: 82 Credit: 921,382 RAC: 0 |
rb_05_01_108_246_rs_stg0_lrlxcst_t000__boincid_SAVE_ALL_OUT_20219_102_1 Windows 7 64 BOINC version 6.10.56 Rosetta Mini version 2.14 Link to result 7 hours for 1 decoy |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
This one over ran its `one day` time a bit :) rhoA15May2010_1lb1_2eyi_ProteinInterfaceDesign_15May2010_20686_20_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=339740643 cpu time 101389.5 DONE :: 2 starting structures 101387 cpu seconds This process generated 4987 decoys from 4987 attempts |
Bikermatt Send message Joined: 12 Feb 10 Posts: 20 Credit: 10,552,445 RAC: 0 |
I have had problems with several of these on all of my systems. The CPU time will be 7+ hours with the last check point being as long as 5 hours earlier. Matt int2_centerfirst2b_1fAc_1i76_ProteinInterfaceDesign_23May2010_21231_62_0 int2_centerfirst2b_1fAc_1k7j_ProteinInterfaceDesign_23May2010_21231_129_0 |
chris Send message Joined: 18 Oct 06 Posts: 6 Credit: 12,215,357 RAC: 0 |
int2_centerfirst2b_1fAc_1fm4_ProteinInterfaceDesign_23May2010_21231_133_0 Windows 7 Ultimate x64 6.10.43 2.14 |
Bikermatt Send message Joined: 12 Feb 10 Posts: 20 Credit: 10,552,445 RAC: 0 |
int2_centerfirst2b_1fAc_2a9o_ProteinInterfaceDesign_23May2010_21231_280 Found this one at 6 1/2 hours (default is 3 on this system) with CPU time at 28 min. I suspended it and then resumed and let it run for another 15 min but my CPU time only went up another 2 min so I aborted the WU. |
Bikermatt Send message Joined: 12 Feb 10 Posts: 20 Credit: 10,552,445 RAC: 0 |
gunn_fragments_SAVE_ALL_OUT_-1rkiA__20675_701_0 This one had an elapsed time of 10.5 hours with CPU time stuck at 2 hours. Suspend/resume allowed it to finish normally in 21101.7 CPU seconds, but elapsed time was almost 15 hours. Default run time on this system is 6 hours. |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Anyone else ever notice that these "long running" tasks all seem to have a very stale "CPU time at last checkpoint" ?? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,140,182 RAC: 15,917 |
I can confirm it. 1715 decoys after 6hours, no further checkpoints after 11 hours of processing is an example I saw tday. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
int_simpleTwo_1f0s_1z9l_ProteinInterfaceDesign_21May2010_21289_95_0 When I checked after just under ten hours of cpu time it had been about six hours since the last checkpoint. Opening the graphics window I could see it was working on model 941 and the step number continue to increase as I watched. Unfortunately, when I checked back later the watchdog had killed the workunit and it was reported back with only 940 models completed. That last one was a doozy. I hope it was able to provide some interesting leads. Snags |
billy ewell 1931 Send message Joined: 30 Mar 07 Posts: 14 Credit: 6,899,522 RAC: 0 |
313189623: This Work Unit was aborted at the 10.5 hour elapsed point, with about 44+% progress and the time to completion of about 9.5 hours and increasing. Windows Vista 32bit/Intel Q9400 @ 2.66 Ghz. Acct. # 160868. Computers uncovered. |
Moritz Winter Send message Joined: 18 Nov 09 Posts: 1 Credit: 110,617 RAC: 0 |
ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_189_0 and ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_380_0 took around 27 h for < 2 % completion before i stopped them |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
Are you sure that the WU is really still running? On my Windows machine, some WUs simply stop processing, even though BOINC thinks they are still running. I know they have stopped because the Windows Task Manager shows that the CPU is no longer working hard, and my CPU fan gets quiet. I wouldn't trust the BOINC Manager to accurately inform me of task status if I were you. I've gotten to the point where I leave Task Manager running all the time so that I can see immediately when a Rosetta task stops processing. deesy |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
It looks likes all the ProteinInterfaceDesign-tasks are long-running models... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,140,182 RAC: 15,917 |
It looks likes all the ProteinInterfaceDesign-tasks are long-running models... Not quite. The ones that start simIF are long-running but the ones that start celldiv are fine. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Ok, you're right, I had some of the celldiv as well, but they ran at night, so I didn't notice them. I don't know what to do with these simIF-PID-tasks. The problem is, that those files get 'stuck' after 5h 50min. This is where the last saved check-point is done. But with the current temperatures I have to turn off my two computers at home during the day (since I can't leave the windows open, when I'm at work). This would result in a major loss... I actually consider increasing my cache and abort those long-running tasks before they start. But I don't really like this idea. cu Joe |
Borgie Send message Joined: 26 Jul 10 Posts: 2 Credit: 58,918 RAC: 0 |
"fc_A_noSmallMvs_fc6x_3cbz_ProteinInterfaceDesign_20Jun2010_21458_256_0" task 355045827 work unit 324321723 Over Success Done 25,601.52 seconds 147.06 claimed 0.68 granted All that time (7+ hrs.) and not even ONE whole credit? My other tasks took less than 3 hr. and granted 40 or more credits. XP Home SP3 AMD Athlon II X2 250 3.01 GHz 1.75 GB RAM |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one was longish! 8hrs for 1 model, On my x6 1055T 4hr R/T. rs_stg0_lrlx_t328__run1_SAVE_ALL_OUT_19365_2958_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=327956645 BOINC:: CPU time: 28815.8s, 14400s + 14400s[2010- 8-14 14:58:31:] :: BOINC InternalDecoyCount: 0 ====================================================== DONE :: 1 starting structures 28815.8 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish </stderr_txt> |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
They're back ... How do you make a hex-core process like a dual core? intSpin1_xxx_ProteinInterfaceDesign Sample tasks: 359697007 359748173 359684781 359685309 Plus a number of others currently on the long road to go see the watchdog. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one was near enough to 7hrs on my 4hr runtime pref, on my x6 1055t. PCS_calmodulin_v1.frag_23-147_SAVE_ALL_OUT_22378_17_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=341596781 ====================================================== DONE :: 2 starting structures 24572.3 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== p.s/ Not good credits either! |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
I seem to be getting some "sub-optimal" results with tasks with a name in form of: PCS_xxxx_atensor.frag.... For example - Task ID 373876225 - running on a dedicated Phenom II core clocked at about 3.0 ghz with adequate memory netted only 39 credits for a bit over 10 hours run time - CPU time, not wall clock time. Which is "watchdog territory" as my run time is set to a modest 6 hours. I have in the past reviewed how credits are apportioned and even think I understand it and the way things are "averaged", however if there is a "non-cryogenic" CPU out there who can process Rosetta at a rate 8 to 10 faster than my Phenom II does (which is what it would take to drop my calculated average that much) I want to look into buying a few ... |
Message boards :
Number crunching :
Report long-running models here
©2024 University of Washington
https://www.bakerlab.org