Report stuck & aborted WU here please

Message boards : Number crunching : Report stuck & aborted WU here please

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 17 · Next

AuthorMessage
Nicolas VC

Send message
Joined: 10 Nov 05
Posts: 1
Credit: 1,619,083
RAC: 0
Message 11515 - Posted: 1 Mar 2006, 11:48:35 UTC

I have BOINC 5.2.13 running since Dec 2005.
Windows XP detects my pc as a multiprocessor system.
My value for time switch betweeen apps is 60 min from the begginning.

But since four days ago all my results except one are client errors.

hostid=59738

12247891 9795546 1 Mar 2006 7:39:42 UTC 15 Mar 2006 7:39:42 UTC In Progress Unknown New --- --- ---
12228641 9791198 28 Feb 2006 22:13:42 UTC 1 Mar 2006 10:44:55 UTC Over Client error Computing 28,740.80 50.91 ---
12176920 9700235 27 Feb 2006 16:01:48 UTC 27 Feb 2006 18:02:38 UTC Over Client error Computing 3,552.36 6.29 ---
12131713 9716891 28 Feb 2006 14:31:53 UTC 28 Feb 2006 22:13:42 UTC Over Client error Computing 11,330.28 20.07 ---
12121508 9706950 27 Feb 2006 18:07:44 UTC 28 Feb 2006 14:31:53 UTC Over Client error Done 28,363.94 50.24 ---
12092713 9694375 25 Feb 2006 18:19:48 UTC 27 Feb 2006 11:51:23 UTC Over Client error Done 29,023.41 51.41 ---
12069367 9671614 25 Feb 2006 4:22:05 UTC 27 Feb 2006 11:51:23 UTC Over Client error Done 28,695.72 50.83 ---
12007623 9624382 26 Feb 2006 8:25:02 UTC 27 Feb 2006 16:01:48 UTC Over Client error Computing 6,715.63 11.90 ---
11879720 9572581 22 Feb 2006 10:02:46 UTC 25 Feb 2006 4:22:05 UTC Over Success Done 25,620.59 45.55 45.55
11879602 9572468 22 Feb 2006 10:01:08 UTC 8 Mar 2006 10:01:08 UTC In Progress Unknown New --- --- ---
11873799 9566928 22 Feb 2006 6:00:47 UTC 22 Feb 2006 9:58:35 UTC Over Client error Computing 5,748.20 10.22 ---
11840411 9535527 21 Feb 2006 1:30:19 UTC 21 Feb 2006 4:38:20 UTC Over Client error Computing 3,483.77 6.19 ---
11689498 9467768 18 Feb 2006 18:50:50 UTC 19 Feb 2006 9:52:24 UTC Over Client error Computing 9,606.52 16.99 ---
11655933 9449069 18 Feb 2006 7:35:25 UTC 18 Feb 2006 16:54:01 UTC Over Client error Computing 10,126.22 17.91 ---
11606229 9417205 17 Feb 2006 3:51:27 UTC 17 Feb 2006 18:25:28 UTC Over Client error Computing 11,394.44 20.16 ---
11606180 9417159 17 Feb 2006 3:51:27 UTC 17 Feb 2006 22:08:54 UTC Over Client error Computing 9,874.17 17.47 ---
11537668 9363268 16 Feb 2006 4:57:07 UTC 17 Feb 2006 3:51:27 UTC Over Client error Computing 14,039.75 24.83 ---
11537657 9363257 16 Feb 2006 4:57:07 UTC 16 Feb 2006 22:55:01 UTC Over Client error Computing 15,775.41 27.90 ---
11454642 9263916 15 Feb 2006 8:54:16 UTC 15 Feb 2006 14:19:01 UTC Over Client error Done 6,308.38 11.16 ---
11454641 9285436 15 Feb 2006 8:54:16 UTC 15 Feb 2006 14:19:01 UTC Over Client error Done 4,996.91 8.84 ---
11229047 5611704 14 Feb 2006 7:24:17 UTC 15 Feb 2006 8:54:16 UTC Over Client error Computing 25,129.64 44.45 ---
7867088 2157283 23 Jan 2006 7:45:08 UTC 23 Jan 2006 12:08:57 UTC Over Client error Computing 6,295.66 11.23 ---
7411296 5919940 24 Jan 2006 18:34:04 UTC 25 Jan 2006 4:06:00 UTC Over Client error Computing 18,074.89 32.23 ---

As I can see, other computers can solve the same WUs.

But if there isn't a workaround I am thinking about suspend rosetta project until the stabilization of the client. Maybe next version.
Nicolas Velazquez
noquierocomprar@hotmail.com
ID: 11515 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Osku87

Send message
Joined: 1 Nov 05
Posts: 17
Credit: 280,268
RAC: 0
Message 11535 - Posted: 1 Mar 2006, 22:10:44 UTC - in response to Message 11515.  
Last modified: 1 Mar 2006, 22:12:59 UTC

Got stucked WU. Stucks always to step 20840 (1,0%) and when restarting the client starts calculation all over. Tried three times to restart the client with the same effect. (Helps usually when stucked in 1,0%). Now aborting.

Result ID
ID: 11535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Team TMR

Send message
Joined: 2 Nov 05
Posts: 21
Credit: 1,583,679
RAC: 0
Message 11557 - Posted: 2 Mar 2006, 11:13:57 UTC
Last modified: 2 Mar 2006, 11:16:18 UTC

This one WU 9696277 was stuck on 1% for 3 days! I've just aborted it.

No wonder my daily points have taken a hit.

Looking forward to getting the credit it...
ID: 11557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
casio7131

Send message
Joined: 10 Oct 05
Posts: 35
Credit: 149,748
RAC: 0
Message 11576 - Posted: 2 Mar 2006, 23:26:01 UTC
Last modified: 2 Mar 2006, 23:27:59 UTC

not too sure whether you're still concerned with these...

ABINITgv_hom021_1gvp__322_53_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=12134791
was stuck at 1% after >5 hours using boinc. [edit: i also checked the graphics to ensure that it actually was stuck.] now using the command line method, it is at 13.1% after 1h 18min.

rosetta_4.82_windows_intelx86.exe xx 1gvp _ -output_silent_gz -silent -increase_cycles 10 -new_centroid_packing -no_filters -nstruct 10 -protein_name_prefix hom021_ -frags_name_prefix hom021_ -constant_seed -jran 3884548
ID: 11576 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11577 - Posted: 2 Mar 2006, 23:36:37 UTC

This one was 24 +Hrs and 27 left tog o I hope to get some credit out of these failing WU's.


3/2/2006 5:22:32 PM|rosetta@home|Unrecoverable error for result ABINITew_hom002_1ew4A_322_56_0 (aborted via GUI RPC)

ID: 11577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AKH54
Avatar

Send message
Joined: 8 Dec 05
Posts: 4
Credit: 1,812,208
RAC: 0
Message 11599 - Posted: 3 Mar 2006, 11:48:09 UTC

I did set my target run time to 2 hours but I have a WU that has been running for over 6 hrs & still on 1% & the completion time is increasing

Is this a duff WU or should I try and perservere

Alan

This is the second time I have posted this, First time was in the wrong place??
ID: 11599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11601 - Posted: 3 Mar 2006, 12:45:23 UTC

2 more gone. This is getting ceazy.



3/3/2006 6:46:23 AM|rosetta@home|Unrecoverable error for result ABINITen_hom009_1enh__322_39_0 ( - exit code -1073741811 (0xc000000d))
3/3/2006 6:46:26 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_323_1748_0 ( - exit code -1073741811 (0xc000000d))

ID: 11601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 11610 - Posted: 3 Mar 2006, 16:31:18 UTC - in response to Message 11601.  

2 more gone. This is getting ceazy.



3/3/2006 6:46:23 AM|rosetta@home|Unrecoverable error for result ABINITen_hom009_1enh__322_39_0 ( - exit code -1073741811 (0xc000000d))
3/3/2006 6:46:26 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_323_1748_0 ( - exit code -1073741811 (0xc000000d))


sorry! David is working on a general fix for this error and running lots of tests on ralph this week. there should be a solution soon ...
ID: 11610 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile vfrey

Send message
Joined: 17 Sep 05
Posts: 9
Credit: 705,755
RAC: 266
Message 11618 - Posted: 3 Mar 2006, 19:03:11 UTC

a WU stuck at 1 %

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9740377

unfortunately it ran for more than 29 hours until I noticed it...
ID: 11618 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11630 - Posted: 4 Mar 2006, 1:47:46 UTC

2 more


3/3/2006 7:50:23 PM|rosetta@home|Unrecoverable error for result ABINITpt_hom015_1ptq__322_19_0 ( - exit code -1073741811 (0xc000000d))
3/3/2006 7:50:25 PM|rosetta@home|Unrecoverable error for result ABINITpg_hom016_1pgx__322_6_0 ( - exit code -1073741811 (0xc000000d))

ID: 11630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11654 - Posted: 4 Mar 2006, 16:31:29 UTC

2 more


3/4/2006 7:43:47 AM|rosetta@home|Unrecoverable error for result ABINITa1_hom013_1a19A_320_84_1 ( - exit code -1073741811 (0xc000000d))
3/4/2006 7:43:48 AM|rosetta@home|Unrecoverable error for result ABINITen_hom019_1enh__322_79_1 ( - exit code -1073741811 (0xc000000d))



ID: 11654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
simpe73

Send message
Joined: 20 Feb 06
Posts: 4
Credit: 438,570
RAC: 0
Message 11659 - Posted: 4 Mar 2006, 22:46:47 UTC

Here is some. I only checked 3 of my 40 computers, so there is alots of more that will not be reported. It seems that the my problems are related to pausing the WU. All computers will stop crunching will user is active.

HOST 168941

4.3.2006 12:45:42||Suspending computation and network activity - user
is active
4.3.2006 12:45:42|rosetta@home|Pausing result
FAST_ABINITIO_DEFAULT_2acy__306_3270_2 (removed from memory)
4.3.2006 12:45:42|rosetta@home|Pausing result
ABINITsc_hom005_1scjB_322_42_1 (removed from memory)
4.3.2006 12:45:43|rosetta@home|Unrecoverable error for result
FAST_ABINITIO_DEFAULT_2acy__306_3270_2 ( - exit code -1073741819 (0xc0000005))
4.3.2006 12:45:43|rosetta@home|Unrecoverable error for result
ABINITsc_hom005_1scjB_322_42_1 ( - exit code -1073741819 (0xc0000005))

HOST 168943
4.3.2006 12:45:51|rosetta@home|Pausing result
NEW_SOFT_CENTROID_PACKING_1mky_225_6294_2 (removed from memory)
4.3.2006 12:45:51|rosetta@home|Pausing result
ABINITig_hom007_1ig5A_322_2_1 (removed from memory)
4.3.2006 12:45:53|rosetta@home|Unrecoverable error for result
NEW_SOFT_CENTROID_PACKING_1mky_225_6294_2 ( - exit code -164 (0xffffff5c))

HOST 168960
4.3.2006 12:51:12||Suspending computation and network activity - user
is active
4.3.2006 12:51:12|rosetta@home|Pausing result
ABINITrn_hom025_1rnbA_322_79_0 (removed from memory)
4.3.2006 12:51:12|rosetta@home|Pausing result
PRODUCTION_ABINITIO_INCREASECYCLES50_1cg5B_317_853_2 (removed from memory)
4.3.2006 12:51:14|rosetta@home|Unrecoverable error for result
ABINITrn_hom025_1rnbA_322_79_0 ( - exit code -164 (0xffffff5c))
ID: 11659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stephen Miller

Send message
Joined: 18 Sep 05
Posts: 13
Credit: 16,294,215
RAC: 0
Message 11661 - Posted: 5 Mar 2006, 0:06:52 UTC - in response to Message 11659.  
Last modified: 5 Mar 2006, 0:32:27 UTC

I have had two 1% stuck recently.

This one wasted 50+ hours but finished after one BOINC restart:
2/25/2006 1:50:25 AM|rosetta@home|Resuming computation for result PRODUCTION_ABINITIO_DBFLAGS_1aiu__307_294_1 using rosetta version 482

This one wasted 15 hours before I restarted BOINC. Now it is at 45:00 minutes and still at 1% and showing 8:00:00 to complete:
3/4/2006 3:27:35 PM|rosetta@home|Resuming computation for result HB_BARCODE_30_2chf__347_425_0 using rosetta version 482

I plan to reboot and restart to see if it will complete.
Update - It has now passed 1% and expect it to finish.


Stephen M
ID: 11661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nmelhorn

Send message
Joined: 16 Oct 05
Posts: 1
Credit: 177,616
RAC: 0
Message 11675 - Posted: 5 Mar 2006, 5:25:34 UTC

The following WU assigned to me:

ResultID 12020507 WUID 9636787
Sent 26 Feb 2006 14:21:21 UTC

still shows In Progress / Unknown / New on my Results page, though there's no record left in my machine. The adjacent WU's failed.

I assume I should notify here, so the WU can be quickly reassigned elsewhere.

--regards, Nate

ID: 11675 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11679 - Posted: 5 Mar 2006, 13:12:26 UTC

Stuck at 1% for 15 hours:

ABINITvc_home007_1vcc_337_21_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11679 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stu D.
Avatar

Send message
Joined: 3 Mar 06
Posts: 8
Credit: 575,867
RAC: 0
Message 11680 - Posted: 5 Mar 2006, 13:36:20 UTC

3/4/2006 11:08:23 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB009_1dtj__340_108_0 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 11680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 3
Credit: 5,487,674
RAC: 2
Message 11681 - Posted: 5 Mar 2006, 14:08:57 UTC

ABINITwi_hom007_1wit__337_79_0 is stuck at 1% after 11 hours.

The system is an Athlon XP 2200+ running Windows 2000 with version 5.2.13 of the BOINC client.
ID: 11681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11682 - Posted: 5 Mar 2006, 14:10:41 UTC - in response to Message 11679.  
Last modified: 5 Mar 2006, 14:11:12 UTC

Stuck at 1% for 15 hours:

ABINITvc_home007_1vcc_337_21_0


Got another one stuck at 1%:

HB_BARCODE_30_1acf_347_958_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11683 - Posted: 5 Mar 2006, 14:43:08 UTC - in response to Message 11682.  

And one more:

ABINITvi_hom020_2vik_337_83_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ib Rasmussen

Send message
Joined: 27 Sep 05
Posts: 16
Credit: 211,416
RAC: 0
Message 11704 - Posted: 6 Mar 2006, 8:05:03 UTC

SSFEATURES_BARCODE_ABINITIO_1acf__334_321_0 was stuck at 1% for 57+ hours. I tried stopping and restarting Boinc, but it restarted the wu at 00:00:00, so I killed it.

/Ib
ID: 11704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 17 · Next

Message boards : Number crunching : Report stuck & aborted WU here please



©2025 University of Washington
https://www.bakerlab.org