Miscellaneous Work Unit Errors

Message boards : Number crunching : Miscellaneous Work Unit Errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 11179 - Posted: 22 Feb 2006, 4:11:54 UTC

Result 11924681 returned an
"Incorrect function. (0x1) - exit code 1 (0x1)"
message
ID: 11179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Morten Starkeby
Avatar

Send message
Joined: 18 Feb 06
Posts: 10
Credit: 472,142
RAC: 0
Message 11203 - Posted: 22 Feb 2006, 6:53:52 UTC - in response to Message 10953.  


I got the following error today:

22/02/2006 07:40:42|rosetta@home|Unrecoverable error for result PRODUCTION_ABINITIO_INCREASECYCLES50_1opd__317_273_0 ( - exit code -1073741811 (0xc000000d))

https://boinc.bakerlab.org/rosetta/result.php?resultid=11852924 on computer https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=167418

Leave in memory is set to YES
Application switch time: 120 min

Client version: 5.3.19 (BBC CCE version)

ID: 11203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kevin

Send message
Joined: 15 Jan 06
Posts: 21
Credit: 109,496
RAC: 0
Message 11204 - Posted: 22 Feb 2006, 7:16:30 UTC - in response to Message 11179.  
Last modified: 22 Feb 2006, 7:17:26 UTC

Result 11924681 returned an
"Incorrect function. (0x1) - exit code 1 (0x1)"
message


I had 5 of those 1btn_fullatom units fail tonight.

9593646
9593242
9592682
9591486
9594565

ID: 11204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 11208 - Posted: 22 Feb 2006, 7:26:18 UTC

I just had an entire daily quota of those error out -as fast as they could download.
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 11208 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Hagen

Send message
Joined: 26 Sep 05
Posts: 5
Credit: 46,795
RAC: 0
Message 11210 - Posted: 22 Feb 2006, 7:54:34 UTC


Same here:

2006-02-22 08:37:42 [rosetta@home] Unrecoverable error for result 1btn_fullatom_dec04_3_318_15_2 (process exited with code 1 (0x1))
ID: 11210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 11213 - Posted: 22 Feb 2006, 10:03:17 UTC

I just cancelled this batch. There is definitely something wrong with this batch and I alerted the person in our lab who submitted it.
ID: 11213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11221 - Posted: 22 Feb 2006, 21:04:27 UTC

Here is a list of errors i have gotten today.


2/22/2006 3:15:02 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:15:05 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:15:05 AM|rosetta@home|No schedulers responded
2/22/2006 3:15:10 AM|rosetta@home|Deferring communication with project for 55 seconds
2/22/2006 3:16:05 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:16:05 AM|rosetta@home|Reason: To report results
2/22/2006 3:16:05 AM|rosetta@home|Reporting 5 results
2/22/2006 3:16:28 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:16:30 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:16:30 AM|rosetta@home|No schedulers responded
2/22/2006 3:16:35 AM|rosetta@home|Deferring communication with project for 54 seconds
2/22/2006 3:17:30 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:17:30 AM|rosetta@home|Reason: To report results
2/22/2006 3:17:30 AM|rosetta@home|Reporting 5 results
2/22/2006 3:17:52 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:17:55 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:17:55 AM|rosetta@home|No schedulers responded
2/22/2006 3:18:00 AM|rosetta@home|Deferring communication with project for 55 seconds
2/22/2006 3:18:56 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:18:56 AM|rosetta@home|Reason: To report results
2/22/2006 3:18:56 AM|rosetta@home|Reporting 5 results
2/22/2006 3:19:18 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:19:21 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:19:21 AM|rosetta@home|No schedulers responded
2/22/2006 3:19:26 AM|rosetta@home|Deferring communication with project for 2 minutes and 22 seconds
2/22/2006 3:21:51 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:21:51 AM|rosetta@home|Reason: To report results
2/22/2006 3:21:51 AM|rosetta@home|Reporting 5 results
2/22/2006 3:22:14 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:22:16 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:22:16 AM|rosetta@home|No schedulers responded
2/22/2006 3:22:21 AM|rosetta@home|Deferring communication with project for 1 minutes and 52 seconds
2/22/2006 3:24:16 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:24:16 AM|rosetta@home|Reason: To report results
2/22/2006 3:24:16 AM|rosetta@home|Reporting 5 results
2/22/2006 3:24:38 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:24:41 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:24:41 AM|rosetta@home|No schedulers responded
2/22/2006 3:24:46 AM|rosetta@home|Deferring communication with project for 12 minutes and 25 seconds
2/22/2006 3:37:12 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 3:37:12 AM|rosetta@home|Reason: To report results
2/22/2006 3:37:12 AM|rosetta@home|Reporting 5 results
2/22/2006 3:37:34 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 3:37:37 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 3:37:37 AM|rosetta@home|No schedulers responded
2/22/2006 3:37:42 AM|rosetta@home|Deferring communication with project for 27 minutes and 2 seconds
2/22/2006 4:04:48 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 4:04:48 AM|rosetta@home|Reason: To report results
2/22/2006 4:04:48 AM|rosetta@home|Reporting 5 results
2/22/2006 4:05:10 AM||Couldn't connect to hostname [boinc.bakerlab.org]
2/22/2006 4:05:13 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106
2/22/2006 4:05:13 AM|rosetta@home|No schedulers responded
2/22/2006 4:05:18 AM|rosetta@home|Deferring communication with project for 2 hours, 14 minutes, and 36 seconds
2/22/2006 5:05:21 AM|rosetta@home|Deferring communication with project for 1 hours, 14 minutes, and 34 seconds
2/22/2006 6:05:24 AM|rosetta@home|Deferring communication with project for 14 minutes and 31 seconds
2/22/2006 6:20:00 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
2/22/2006 6:20:00 AM|rosetta@home|Reason: To report results
2/22/2006 6:20:00 AM|rosetta@home|Reporting 5 results
2/22/2006 6:20:05 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
2/22/2006 7:22:44 AM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_2ci2I_304_817_1 ( - exit code -1073741811 (0xc000000d))
2/22/2006 7:22:44 AM||request_reschedule_cpus: process exited
2/22/2006 7:22:44 AM|rosetta@home|Computation for result FAST_ABINITIO_CENTROID_PACKING_2ci2I_304_817_1 finished
2/22/2006 7:22:44 AM|rosetta@home|Starting result FAST_ABINITIO_CENTROID_PACKING_1kpeA_305_837_1 using rosetta version 482
2/22/2006 7:22:46 AM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_1ptq__304_97_2 ( - exit code -1073741811 (0xc000000d))
2/22/2006 7:22:46 AM||request_reschedule_cpus: process exited
2/22/2006 7:22:46 AM|rosetta@home|Computation for result FAST_ABINITIO_CENTROID_PACKING_1ptq__304_97_2 finished
2/22/2006 7:22:46 AM|rosetta@home|Starting result PRODUCTION_ABINITIO_QUADRUPLELONGRANGEANTIPARALLEL_2chf__311_1155_2 using rosetta version 482
2/22/2006 7:22:48 AM|rosetta@home|Deferring communication with project for 57 seconds
2/22/2006 3:02:41 PM|rosetta@home|Unrecoverable error for result FAST_ABINITIO_CENTROID_PACKING_1kpeA_305_837_1 ( - exit code -1073741811 (0xc000000d))

ID: 11221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 11222 - Posted: 22 Feb 2006, 22:32:42 UTC
Last modified: 22 Feb 2006, 22:33:47 UTC

From the looks of your message log, it appears the servers were down for a while. I'm not sure though it that really occured.

Here's what the Wiki says about 106

ERR_IO -106 system I/O A failure to read/write from the Disk Drive, or in the case of network transmissions, a failure to send and receive data.
A failure to send and receive data to and from a Project Server generally means that during the transmission to or from the Project Server, or a router along the way has reset the TCP connection between the BOINC Client Software and the Project Server. This can happen for a number of different reasons, such as the Project Server being overloaded, or packets being lost in route.

Note:
This is probably the most common error after an outage of the Project.
ID: 11222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SallyH

Send message
Joined: 4 Nov 05
Posts: 6
Credit: 4,799,395
RAC: 0
Message 11223 - Posted: 22 Feb 2006, 22:46:12 UTC

I am having more errors than success since 4.82. please look at my errors for machine #166709

Thanks...
ID: 11223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Morten Starkeby
Avatar

Send message
Joined: 18 Feb 06
Posts: 10
Credit: 472,142
RAC: 0
Message 11227 - Posted: 22 Feb 2006, 23:55:03 UTC
Last modified: 22 Feb 2006, 23:59:02 UTC

Received another client error:

Result ID: 11875565

<core_client_version>5.3.19</core_client_version>
<message> - exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 1054947
# cpu_run_time_pref: 28800

</stderr_txt>
ID: 11227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Interboy

Send message
Joined: 28 Sep 05
Posts: 3
Credit: 737,692
RAC: 92
Message 11239 - Posted: 23 Feb 2006, 11:58:34 UTC
Last modified: 23 Feb 2006, 12:00:38 UTC

Received this error:

Result ID: https://boinc.bakerlab.org/rosetta/result.php?resultid=11838015

<core_client_version>5.2.12</core_client_version>
<message> - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1072606
# DONE :: 1 starting structures built 15 (nstruct) times
# This process generated 15 decoys from 15 attempts

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C83AEC6 write attempt to address 0x00000004


</stderr_txt>
ID: 11239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Hermes

Send message
Joined: 17 Sep 05
Posts: 2
Credit: 113,946
RAC: 0
Message 11243 - Posted: 23 Feb 2006, 13:08:32 UTC

WU 9531208 already failed twice.

ID: 11243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Beezlebub
Avatar

Send message
Joined: 18 Oct 05
Posts: 40
Credit: 260,375
RAC: 0
Message 11252 - Posted: 23 Feb 2006, 15:14:31 UTC

I have no WU Errors since starting this computer but the completion times are getting out of hand:

11869025 9562409 21 Feb 2006 7:24:03 UTC 22 Feb 2006 14:43:46 UTC Over Success Done 86,361.25 200.03 200.03
11839823 9534950 21 Feb 2006 10:03:07 UTC 23 Feb 2006 12:38:51 UTC Over Success Done 86,134.84 199.50 199.50
11789045 9520604 19 Feb 2006 15:42:12 UTC 20 Feb 2006 2:48:30 UTC Over Success Done 28,705.73 57.98 57.98
11738428 9498702 19 Feb 2006 7:07:55 UTC 19 Feb 2006 15:42:12 UTC Over Success Done 27,955.66 57.77 57.77
11738306 9498580 19 Feb 2006 7:07:55 UTC 19 Feb 2006 18:06:22 UTC Over Success Done 29,220.14 60.38 60.38
11716364 9491951 19 Feb 2006 1:55:07 UTC 19 Feb 2006 7:07:55 UTC Over Success Done 11,418.19 23.60 23.60
11716363 9491950 19 Feb 2006 1:55:07 UTC 19 Feb 2006 7:07:55 UTC Over Success Done 10,428.11 21.55 21.55
11708415 9484498 18 Feb 2006 23:32:31 UTC 19 Feb 2006 4:19:22 UTC Over Success Done 8,232.70 17.01 17.01

When I started:

11317355 9181424 14 Feb 2006 16:17:00 UTC 15 Feb 2006 0:53:42 UTC Over Success Done 1,835.91 3.79 3.79
11317354 9181423 14 Feb 2006 16:17:00 UTC 15 Feb 2006 0:53:42 UTC Over Success Done 4,243.00 8.77 8.77
11270626 9137943 14 Feb 2006 11:27:14 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 4,213.20 8.93 8.93
11270602 9137919 14 Feb 2006 11:27:14 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 3,303.31 7.00 7.00
11257355 9125508 14 Feb 2006 10:17:26 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 4,391.36 9.30 9.30
11251768 9120391 14 Feb 2006 9:31:17 UTC 14 Feb 2006 11:27:14 UTC Over Success Done 2,127.61 4.51 4.51
11195037 9075044 14 Feb 2006 3:46:26 UTC 14 Feb 2006 6:34:39 UTC Over Success Done 1,288.11 2.73 2.73
11195025 5605330 14 Feb 2006 3:46:26 UTC 14 Feb 2006 15:03:21 UTC Over Success Done 8,683.08 18.40 18.40
11194984 9074992 14 Feb 2006 3:46:26 UTC 14 Feb 2006 6:34:39 UTC Over Success Done 4,997.23 10.59 10.59

e6600 quad @ 2.5ghz
2418 floating point
5227 integer

e6750 dual @ 3.71ghz
3598 floating point
7918 integer


ID: 11252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 11253 - Posted: 23 Feb 2006, 15:39:49 UTC - in response to Message 11252.  
Last modified: 23 Feb 2006, 15:41:04 UTC

I have no WU Errors since starting this computer but the completion times are getting out of hand:

11869025 9562409 21 Feb 2006 7:24:03 UTC 22 Feb 2006 14:43:46 UTC Over Success Done 86,361.25 200.03 200.03
11839823 9534950 21 Feb 2006 10:03:07 UTC 23 Feb 2006 12:38:51 UTC Over Success Done 86,134.84 199.50 199.50

Your WU completion times are very close to one day (86400 sec ;-) because you set your 'target CPU run time' in the Rosetta@home preferences to 1 day - set it to, e.g., two hours if you prefer 'short' WUs.
ID: 11253 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Beezlebub
Avatar

Send message
Joined: 18 Oct 05
Posts: 40
Credit: 260,375
RAC: 0
Message 11257 - Posted: 23 Feb 2006, 16:01:16 UTC

Dummy me :) wasn't paying attention to the new work times,,,,forgot all about it. Sorry
e6600 quad @ 2.5ghz
2418 floating point
5227 integer

e6750 dual @ 3.71ghz
3598 floating point
7918 integer


ID: 11257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
doc :)

Send message
Joined: 4 Oct 05
Posts: 47
Credit: 1,106,102
RAC: 0
Message 11294 - Posted: 24 Feb 2006, 5:51:27 UTC

this WU, result errored out after running less than 3 minutes. graphics were open in a window, not doing anything else at the time of the error.

24/02/2006 06:49:12|rosetta@home|Unrecoverable error for result FAST_ABINITIO_DEFAULT_1acf__306_1770_2 ( - exit code -1073741811 (0xc000000d))
ID: 11294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile SallyH

Send message
Joined: 4 Nov 05
Posts: 6
Credit: 4,799,395
RAC: 0
Message 11308 - Posted: 24 Feb 2006, 10:56:40 UTC
Last modified: 24 Feb 2006, 11:00:23 UTC

Here are the errors that I am getting on 2 AMD opteron systems. I have a X2 4600 that is not getting these errors. I have rebuilt these systems twice in the past few days and still getting an above 90 percent failure rate. MY RAC has dropped from 7700 to now below 6600 since 4.82.

11966026
Name PRODUCTION_ABINITIO_RANDOMFRAG_1pgx__309_163_2
Workunit 9313855
Created 23 Feb 2006 12:50:18 UTC
Sent 23 Feb 2006 22:17:57 UTC
Received 24 Feb 2006 9:48:18 UTC
Server state Over
Outcome Client error
Client state Done
Exit status -1073741811 (0xc000000d)
Computer ID 168708
Report deadline 2 Mar 2006 22:17:57 UTC
CPU time 14284.203125
stderr out

<core_client_version>5.2.13</core_client_version>
<message> - exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 1522481
# cpu_run_time_pref: 14400
# DONE :: 1 starting structures built 99 (nstruct) times
# This process generated 99 decoys from 103 attempts

</stderr_txt>

Validate state Invalid
Claimed credit 115.888749997702
Granted credit 0
application version 4.82
ID: 11308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tng*

Send message
Joined: 28 Oct 05
Posts: 14
Credit: 5,389,798
RAC: 0
Message 11395 - Posted: 25 Feb 2006, 19:39:22 UTC - in response to Message 11323.  
Last modified: 25 Feb 2006, 19:43:16 UTC

For people having many work Unit Errors!!

I have received an e-mail from Dr. Baker with information for any of you who are having a lot of Work Unit errors.

"Could you help us to recommend to people having problems with lots of WU to set the target run time to a smaller value like 2 hours. We think there aren't any new bugs, just with longer run times it is more likely for a WU to have problems."

So if you are having a lot of errors please reset your Time setting to 2 hours and see if that helps.


Having received half a dozen errors on 4.82 on two machines that I don't believe have had that many errors in three months, I did this. Within hours,
another error:

12092296

(edited to show the correct result -- oops)

Not the same as the earlier ones, but there still seem to be problems with a
two-hour setting.

The machines having problems are Dell Dimension 9100s, Pentium D 820, 1 gig,
XP SP2 with all critical updates, Boinc 5.2.13 (updated from from 5.2.2, but
that didn't seem to fix anything).

My athlon 4000+ laptop has had no problems -- maybe something with multi-CPU systems.
ID: 11395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Osku87

Send message
Joined: 1 Nov 05
Posts: 17
Credit: 280,268
RAC: 0
Message 11429 - Posted: 26 Feb 2006, 19:18:32 UTC - in response to Message 11395.  
Last modified: 26 Feb 2006, 19:20:22 UTC

My athlon 4000+ laptop has had no problems -- maybe something with multi-CPU systems.

It's not something with multi-CPU systems. I have Sempron 2400+ and have the same problems. Reducing the time setting to 2 hours didn't solve the problem. For example: 9642560
ID: 11429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
doc :)

Send message
Joined: 4 Oct 05
Posts: 47
Credit: 1,106,102
RAC: 0
Message 11475 - Posted: 27 Feb 2006, 20:26:26 UTC

27/02/2006 20:09:06|rosetta@home|Unrecoverable error for result ABINITe6_hom011_1e6iA_320_4_0 ( - exit code -1073741811 (0xc000000d))

i get that error with that WU type (ABINIT**_hom***...) and 4.82 when i got the graphics open, it crashes for each WU i tried that with at about the time the first model should be finished (i am not opening the graphics in a WU that is past its first checkpoint right now to avoid wasting more cycles than necesarry). if i leave it running by itself without graphics in the background it finishes without problems.
some examples: result, result, result

this WU from the HBLR type of WUs failed with the same error while i had graphics open at a later point.
ID: 11475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : Number crunching : Miscellaneous Work Unit Errors



©2025 University of Washington
https://www.bakerlab.org