Report long-running models here

Message boards : Number crunching : Report long-running models here

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1600
Credit: 28,830,458
RAC: 18,571
Message 66229 - Posted: 19 May 2010, 17:41:53 UTC - in response to Message 66225.  

What gets me about this is that 1205 decoys seemed to run within my 6 hour runtime, then the last decoy had to get shut-down by the watchdog after exceeding 4 hours. Was I just unlucky? The credit award was still reasonable.

Is that all? The job below ran 3224 decoys in 8 hours before the last one got shutdown by the watchdog.
rhoA15May2010_1lb1_3e9v_ProteinInterfaceDesign_15May2010_20686_23_0

From tackleway's post it seems like this is a characteristic of this job-type and not all of us get good credits (mine was over-awarded). Luck of the draw...
ID: 66229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 66235 - Posted: 19 May 2010, 21:45:33 UTC

rb_05_01_108_246_rs_stg0_lrlxcst_t000__boincid_SAVE_ALL_OUT_20219_102_1

Windows 7 64

BOINC version 6.10.56

Rosetta Mini version 2.14

Link to result

7 hours for 1 decoy
ID: 66235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 46
Credit: 18,169,942
RAC: 4,726
Message 66269 - Posted: 21 May 2010, 22:48:24 UTC
Last modified: 21 May 2010, 22:51:58 UTC

This one over ran its `one day` time a bit :)

rhoA15May2010_1lb1_2eyi_ProteinInterfaceDesign_15May2010_20686_20_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=339740643

cpu time 101389.5

DONE :: 2 starting structures 101387 cpu seconds
This process generated 4987 decoys from 4987 attempts
ID: 66269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66324 - Posted: 25 May 2010, 22:15:22 UTC

I have had problems with several of these on all of my systems. The CPU time will be 7+ hours with the last check point being as long as 5 hours earlier.

Matt

int2_centerfirst2b_1fAc_1i76_ProteinInterfaceDesign_23May2010_21231_62_0


int2_centerfirst2b_1fAc_1k7j_ProteinInterfaceDesign_23May2010_21231_129_0
ID: 66324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
chris

Send message
Joined: 18 Oct 06
Posts: 6
Credit: 11,452,154
RAC: 5,362
Message 66337 - Posted: 27 May 2010, 5:20:28 UTC

ID: 66337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66353 - Posted: 29 May 2010, 13:13:35 UTC

int2_centerfirst2b_1fAc_2a9o_ProteinInterfaceDesign_23May2010_21231_280



Found this one at 6 1/2 hours (default is 3 on this system) with CPU time at 28 min.
I suspended it and then resumed and let it run for another 15 min but my CPU time only went up another 2 min so I aborted the WU.


ID: 66353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 66373 - Posted: 30 May 2010, 17:01:51 UTC

gunn_fragments_SAVE_ALL_OUT_-1rkiA__20675_701_0

This one had an elapsed time of 10.5 hours with CPU time stuck at 2 hours.

Suspend/resume allowed it to finish normally in 21101.7 CPU seconds, but elapsed time was almost 15 hours.
Default run time on this system is 6 hours.

ID: 66373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 66376 - Posted: 30 May 2010, 18:05:15 UTC

Anyone else ever notice that these "long running" tasks all seem to have a very stale "CPU time at last checkpoint" ??
ID: 66376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
transient
Avatar

Send message
Joined: 30 Sep 06
Posts: 376
Credit: 10,836,395
RAC: 2,365
Message 66380 - Posted: 31 May 2010, 5:00:25 UTC

Rosetta will checkpoint when finishing a model, before starting the next. That means a checkpoint could be "old" by the definition of a "long running model". So I don't know if it is significant.
ID: 66380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1600
Credit: 28,830,458
RAC: 18,571
Message 66387 - Posted: 31 May 2010, 20:16:29 UTC

I can confirm it. 1715 decoys after 6hours, no further checkpoints after 11 hours of processing is an example I saw tday.
ID: 66387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,398,132
RAC: 1,095
Message 66410 - Posted: 1 Jun 2010, 20:34:10 UTC

int_simpleTwo_1f0s_1z9l_ProteinInterfaceDesign_21May2010_21289_95_0

When I checked after just under ten hours of cpu time it had been about six hours since the last checkpoint. Opening the graphics window I could see it was working on model 941 and the step number continue to increase as I watched. Unfortunately, when I checked back later the watchdog had killed the workunit and it was reported back with only 940 models completed. That last one was a doozy. I hope it was able to provide some interesting leads.


Snags
ID: 66410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
billy ewell 1931

Send message
Joined: 30 Mar 07
Posts: 10
Credit: 5,609,853
RAC: 1,755
Message 66568 - Posted: 13 Jun 2010, 17:50:37 UTC

313189623: This Work Unit was aborted at the 10.5 hour elapsed point, with about 44+% progress and the time to completion of about 9.5 hours and increasing. Windows Vista 32bit/Intel Q9400 @ 2.66 Ghz. Acct. # 160868. Computers uncovered.
ID: 66568 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moritz Winter

Send message
Joined: 18 Nov 09
Posts: 1
Credit: 110,617
RAC: 0
Message 66571 - Posted: 14 Jun 2010, 8:36:01 UTC

ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_189_0
and
ab_05_31_T0564_1_83_pfam_h003__SAVE_ALL_OUT.INGORE_THE_REST_03_05_21325_380_0

took around 27 h for < 2 % completion before i stopped them
ID: 66571 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 66683 - Posted: 24 Jun 2010, 20:17:38 UTC

Are you sure that the WU is really still running? On my Windows machine, some WUs simply stop processing, even though BOINC thinks they are still running. I know they have stopped because the Windows Task Manager shows that the CPU is no longer working hard, and my CPU fan gets quiet. I wouldn't trust the BOINC Manager to accurately inform me of task status if I were you. I've gotten to the point where I leave Task Manager running all the time so that I can see immediately when a Rosetta task stops processing.

deesy
ID: 66683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 66792 - Posted: 6 Jul 2010, 20:43:43 UTC

It looks likes all the ProteinInterfaceDesign-tasks are long-running models...
ID: 66792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1600
Credit: 28,830,458
RAC: 18,571
Message 66793 - Posted: 7 Jul 2010, 8:29:53 UTC - in response to Message 66792.  

It looks likes all the ProteinInterfaceDesign-tasks are long-running models...

Not quite. The ones that start simIF are long-running but the ones that start celldiv are fine.
ID: 66793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 66794 - Posted: 7 Jul 2010, 10:37:56 UTC - in response to Message 66793.  


Not quite. The ones that start simIF are long-running but the ones that start celldiv are fine.

Ok, you're right, I had some of the celldiv as well, but they ran at night, so I didn't notice them.

I don't know what to do with these simIF-PID-tasks. The problem is, that those files get 'stuck' after 5h 50min. This is where the last saved check-point is done. But with the current temperatures I have to turn off my two computers at home during the day (since I can't leave the windows open, when I'm at work). This would result in a major loss... I actually consider increasing my cache and abort those long-running tasks before they start. But I don't really like this idea.

cu Joe

ID: 66794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Borgie

Send message
Joined: 26 Jul 10
Posts: 2
Credit: 58,918
RAC: 0
Message 67033 - Posted: 29 Jul 2010, 2:30:01 UTC
Last modified: 29 Jul 2010, 2:57:35 UTC

"fc_A_noSmallMvs_fc6x_3cbz_ProteinInterfaceDesign_20Jun2010_21458_256_0"

task 355045827 work unit 324321723 Over Success Done
25,601.52 seconds 147.06 claimed 0.68 granted
All that time (7+ hrs.) and not even ONE whole credit?
My other tasks took less than 3 hr. and granted 40 or more credits.

XP Home SP3 AMD Athlon II X2 250
3.01 GHz 1.75 GB RAM
ID: 67033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67208 - Posted: 14 Aug 2010, 5:48:54 UTC
Last modified: 14 Aug 2010, 5:49:40 UTC

This one was longish!

8hrs for 1 model, On my x6 1055T 4hr R/T.

rs_stg0_lrlx_t328__run1_SAVE_ALL_OUT_19365_2958_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=327956645


BOINC:: CPU time: 28815.8s, 14400s + 14400s[2010- 8-14 14:58:31:] :: BOINC
InternalDecoyCount: 0
======================================================
DONE :: 1 starting structures 28815.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
ID: 67208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67249 - Posted: 17 Aug 2010, 19:32:55 UTC

They're back ...

How do you make a hex-core process like a dual core?

intSpin1_xxx_ProteinInterfaceDesign

Sample tasks:

359697007
359748173
359684781
359685309

Plus a number of others currently on the long road to go see the watchdog.


ID: 67249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

Message boards : Number crunching : Report long-running models here



©2021 University of Washington
https://www.bakerlab.org