Report stuck & aborted WU here please

Message boards : Number crunching : Report stuck & aborted WU here please

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 18 · Next

AuthorMessage
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 10315 - Posted: 1 Feb 2006, 14:46:36 UTC

Here's another one that refuses to upload

2/1/2006 7:29:29 AM|rosetta@home|Started upload of BARCODE_30_1shfA_291_347_0_0
2/1/2006 7:29:31 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/2aa/BARCODE_30_1shfA_291_347_0_0 1950 bytes != offset 0 bytes
2/1/2006 7:29:31 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1shfA_291_347_0_0: transient upload error
2/1/2006 7:29:31 AM|rosetta@home|Backing off 49 minutes and 26 seconds on upload of file BARCODE_30_1shfA_291_347_0_0
ID: 10315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TSV Praha a.s.

Send message
Joined: 20 Sep 05
Posts: 1
Credit: 124,425
RAC: 0
Message 10316 - Posted: 1 Feb 2006, 17:51:36 UTC
Last modified: 1 Feb 2006, 17:52:52 UTC


ID: 10316 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MajorPart

Send message
Joined: 15 Jan 06
Posts: 2
Credit: 832
RAC: 0
Message 10339 - Posted: 2 Feb 2006, 11:05:44 UTC

02/02/2006 10:25:59 AM Unrecoverable error for result BARCODE_30_1ubi__292_2754_0 ( - exit code -1073741819 (0xc0000005))
01/02/2006 11:40:16 PM Unrecoverable error for result BARCODE_30_1ubi__292_2751_0 ( - exit code -1073741819 (0xc0000005))
02/02/2006 10:26:01 AM Unrecoverable error for result BARCODE_30_1c9oA_292_2756_0 ( - exit code -1073741819 (0xc0000005))

ID: 10339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 10367 - Posted: 2 Feb 2006, 20:35:43 UTC

Two WUs stuck at 1% and aborted on 02 Feb:

NO_SIM_ANNEAL_BARCODE_30_2reb_286_4502_0

TERMINI_2reb_294_6931_0

The first one ran for 94 hours before I noticed it.
ID: 10367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
arcturus

Send message
Joined: 22 Sep 05
Posts: 16
Credit: 525,440
RAC: 0
Message 10419 - Posted: 3 Feb 2006, 17:13:42 UTC
Last modified: 3 Feb 2006, 17:15:43 UTC

ID: 10419 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 10432 - Posted: 3 Feb 2006, 20:49:05 UTC

WU 7513331 was been running here for 20 hours and was still on 1% when I aborted it. :-(
ID: 10432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chilcotin

Send message
Joined: 5 Nov 05
Posts: 15
Credit: 16,969,500
RAC: 0
Message 10447 - Posted: 4 Feb 2006, 6:27:47 UTC - in response to Message 10432.  

WU 7159329 ran for 98,942.81 seconds on my machine 56262. It stuck at 1% and I aborted it on Feb 3. TaSKO on machine 74659 has it now (and I wish him good luck with it !)
ID: 10447 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tigers_Dave
Avatar

Send message
Joined: 9 Dec 05
Posts: 6
Credit: 109,504,935
RAC: 23,967
Message 10478 - Posted: 5 Feb 2006, 14:12:27 UTC - in response to Message 10447.  

WU 5711705 ran on computer 129018 for 179,315.17 seconds before it failed on February 5.
WU 5857025 ran on computer 105835 for 177,320.33 seconds before it failed on January 31.
WU 5650559 ran on computer 105834 for 143,150.09 seconds before it failed on January 23.
None of these three work units has been completed by another machine.
Thanks.
ID: 10478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eL_nino

Send message
Joined: 20 Jan 06
Posts: 10
Credit: 45,343
RAC: 0
Message 10483 - Posted: 5 Feb 2006, 20:21:38 UTC

6.2.2006 20:09:10|rosetta@home|Unrecoverable error for result TERMINI_1dcj_294_8296_0 ( - exit code -1073741819 (0xc0000005))
6.2.2006 20:09:10||request_reschedule_cpus: process exited
6.2.2006 20:09:10|rosetta@home|Computation for result TERMINI_1dcj_294_8296_0 finished



so, after few minutes this error
ID: 10483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ricky@SETI.USA
Avatar

Send message
Joined: 13 Dec 05
Posts: 20
Credit: 97,355
RAC: 0
Message 10487 - Posted: 5 Feb 2006, 22:07:04 UTC - in response to Message 10483.  

Here is a WU that aborted on me. I started to abort it my self but wanted to see how long it would go:

stderr out <core_client_version>5.3.12.tx36</core_client_version>
<message>Maximum CPU time exceeded
</message>
<stderr_txt>

</stderr_txt>

I think from now on if the go more then an hour then the should based on the "To Complete" time I'll go ahead and abort it. No point wasting time when I could helping my Primary Project.
"Life is like an Ice Cream cone, just when you think you got it licked, it drips all over you!"

ID: 10487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grutte Pier [Wa Oars]~MAB The Frisian
Avatar

Send message
Joined: 6 Nov 05
Posts: 87
Credit: 497,588
RAC: 0
Message 10493 - Posted: 6 Feb 2006, 7:34:32 UTC
Last modified: 6 Feb 2006, 7:45:09 UTC

These show "Unhandled Exception" :
https://boinc.bakerlab.org/rosetta/result.php?resultid=9272934
https://boinc.bakerlab.org/rosetta/result.php?resultid=8910039
https://boinc.bakerlab.org/rosetta/result.php?resultid=7076489

and these show "pending" which is something new ?
https://boinc.bakerlab.org/rosetta/result.php?resultid=7739133
https://boinc.bakerlab.org/rosetta/result.php?resultid=7738709

ID: 10493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 10494 - Posted: 6 Feb 2006, 8:03:35 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=9652979

stuck at 1% after 10 hours, aborted it manually
ID: 10494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Plum Ugly

Send message
Joined: 3 Nov 05
Posts: 24
Credit: 2,005,763
RAC: 0
Message 10496 - Posted: 6 Feb 2006, 9:36:45 UTC

1b72_bar1821_288_9684_1 aborted after over 100 hrs run.
ID: 10496 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
halfmeg

Send message
Joined: 14 Dec 05
Posts: 7
Credit: 2,496
RAC: 0
Message 10500 - Posted: 6 Feb 2006, 13:55:48 UTC

While not specifically a stuck or aborted WU, this one has died 10 times with "Client error" & once delayed on a system from 3 - 31 January (lapsed due date). Original create date of 21 Dec 2005 - DEFAULT_1n0u_218_656 .

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3840471

stderr - Incorrect function. (0x1) - exit code 1 (0x1)



ID: 10500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rebel Alliance

Send message
Joined: 4 Nov 05
Posts: 50
Credit: 3,579,531
RAC: 0
Message 10539 - Posted: 7 Feb 2006, 15:13:04 UTC

More units that refuse to upload

"2/7/2006 9:09:27 AM|rosetta@home|Started upload of BARCODE_30_1c8cA_299_4324_0_0
2/7/2006 9:09:30 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/2d9/BARCODE_30_1c8cA_299_4324_0_0 1949 bytes != offset 0 bytes
2/7/2006 9:09:30 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1c8cA_299_4324_0_0: transient upload error
2/7/2006 9:09:30 AM|rosetta@home|Backing off 2 hours, 37 minutes, and 53 seconds on upload of file BARCODE_30_1c8cA_299_4324_0_0
2/7/2006 9:09:36 AM|rosetta@home|Started upload of BARCODE_30_1a68__299_8282_0_0
2/7/2006 9:09:38 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1c/BARCODE_30_1a68__299_8282_0_0 4869 bytes != offset 0 bytes
2/7/2006 9:09:38 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1a68__299_8282_0_0: transient upload error
2/7/2006 9:09:38 AM|rosetta@home|Backing off 1 minutes and 18 seconds on upload of file BARCODE_30_1a68__299_8282_0_0
2/7/2006 9:09:50 AM|rosetta@home|Started upload of BARCODE_30_1bk2__299_8469_0_0
2/7/2006 9:09:52 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/d2/BARCODE_30_1bk2__299_8469_0_0 723 bytes != offset 0 bytes
2/7/2006 9:09:52 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1bk2__299_8469_0_0: transient upload error
2/7/2006 9:09:52 AM|rosetta@home|Backing off 3 hours, 59 minutes, and 20 seconds on upload of file BARCODE_30_1bk2__299_8469_0_0
2/7/2006 9:09:57 AM|rosetta@home|Started upload of BARCODE_30_1b3aA_299_8463_0_0
2/7/2006 9:09:59 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/237/BARCODE_30_1b3aA_299_8463_0_0 723 bytes != offset 0 bytes
2/7/2006 9:09:59 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1b3aA_299_8463_0_0: transient upload error
2/7/2006 9:09:59 AM|rosetta@home|Backing off 16 minutes and 21 seconds on upload of file BARCODE_30_1b3aA_299_8463_0_0
2/7/2006 9:10:07 AM|rosetta@home|Started upload of BARCODE_30_1ubi__299_8499_0_0
2/7/2006 9:10:10 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/137/BARCODE_30_1ubi__299_8499_0_0 1949 bytes != offset 0 bytes
2/7/2006 9:10:10 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1ubi__299_8499_0_0: transient upload error
2/7/2006 9:10:10 AM|rosetta@home|Backing off 3 hours, 2 minutes, and 51 seconds on upload of file BARCODE_30_1ubi__299_8499_0_0
2/7/2006 9:10:28 AM|rosetta@home|Started upload of BARCODE_30_1bk2__299_8547_0_0
2/7/2006 9:10:30 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1ff/BARCODE_30_1bk2__299_8547_0_0 1949 bytes != offset 0 bytes
2/7/2006 9:10:30 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1bk2__299_8547_0_0: transient upload error
2/7/2006 9:10:30 AM|rosetta@home|Backing off 1 hours, 10 minutes, and 41 seconds on upload of file BARCODE_30_1bk2__299_8547_0_0
2/7/2006 9:10:57 AM|rosetta@home|Started upload of BARCODE_30_1a68__299_8282_0_0
2/7/2006 9:11:00 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1c/BARCODE_30_1a68__299_8282_0_0 4869 bytes != offset 0 bytes
2/7/2006 9:11:00 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1a68__299_8282_0_0: transient upload error
2/7/2006 9:11:00 AM|rosetta@home|Backing off 1 hours, 35 minutes, and 40 seconds on upload of file BARCODE_30_1a68__299_8282_0_0"

ID: 10539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tribaal
Avatar

Send message
Joined: 6 Feb 06
Posts: 80
Credit: 2,754,607
RAC: 0
Message 10544 - Posted: 7 Feb 2006, 19:11:23 UTC
Last modified: 7 Feb 2006, 19:19:11 UTC

[EDIT] Sorry I read the tech news, I understand this is already known and looked into. If my machine's stats may help I'll be happy to provide the info on request. [/EDIT]

07.02.2006 20:07:56|rosetta@home|Unrecoverable error for result PRODUCTION_ABINITIO_2acy__250_1600_1 ( - exit code -1073741819 (0xc0000005))

07.02.2006 20:07:56||request_reschedule_cpus: process exited

07.02.2006 20:07:56|rosetta@home|Computation for result PRODUCTION_ABINITIO_2acy__250_1600_1 finished

07.02.2006 20:08:57|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi

ID: 10544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B@H] Ray
Avatar

Send message
Joined: 20 Sep 05
Posts: 118
Credit: 100,251
RAC: 0
Message 10612 - Posted: 10 Feb 2006, 2:54:13 UTC
Last modified: 10 Feb 2006, 2:54:53 UTC

This unit (Result ID 7393548) stuck at 10% for 10 hours. Aborted it when I got home and spotted that.

Ray


Pizza@Home Rays Place Rays place Forums
ID: 10612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FreeLarry

Send message
Joined: 6 Oct 05
Posts: 1
Credit: 1,327,104
RAC: 0
Message 10620 - Posted: 10 Feb 2006, 8:09:14 UTC
Last modified: 10 Feb 2006, 8:10:35 UTC

Heres another bad unit. Aborted after 38 hrs at 1%.

https://boinc.bakerlab.org/rosetta/result.php?resultid=10130288

Result ID 10130288
Name PRODUCTION_ABINITIO_CENTROID_PACKING_1urnA_301_803_0
Workunit 8174718

Larry
ID: 10620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B^S] ThatGuy

Send message
Joined: 4 Jan 06
Posts: 3
Credit: 24,872
RAC: 0
Message 10650 - Posted: 11 Feb 2006, 3:05:19 UTC

I got home to find this one stuck at 1% after over 20 hours:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=8234872

PRODUCTION_ABINITO_CENTROID_PACKING_1nspA_301_1457_0


ID: 10650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10668 - Posted: 11 Feb 2006, 19:18:43 UTC

Last 24 hours I get those ones stuck , then I aborted manualy
and other errors too ... see -:(

Date Host Project ID Message
2/10/2006 2:43:24 PM carlos.cp3 rosetta@home 271 No work from project
2/10/2006 8:09:55 PM crobertp.cp3 rosetta@home 18 Unrecoverable error for result BARCODE_30_1ubi__299_25012_0 (aborted by user)
2/10/2006 8:42:45 PM crobertp.cp3 rosetta@home 43 Unrecoverable error for result BARCODE_30_1tig__299_25504_0 (aborted by user)
2/10/2006 8:51:21 PM crobertp.cp3 rosetta@home 68 Unrecoverable error for result BARCODE_30_2ci2I_299_25648_0 (aborted by user)
2/10/2006 8:58:09 PM crobertp.cp3 rosetta@home 98 Unrecoverable error for result BARCODE_30_1cc8A_299_25801_0 (process exited with code 131 (0x83))
2/10/2006 10:37:14 PM crobertp.cp3 rosetta@home 110 Unrecoverable error for result BARCODE_30_4ubpA_299_26045_0 (aborted by user)
2/11/2006 1:00:05 AM carlos.cp3 rosetta@home 371 No work from project
2/11/2006 1:04:20 AM carlos.cp3 rosetta@home 376 No work from project
2/11/2006 1:08:28 AM carlos.cp3 rosetta@home 381 No work from project
2/11/2006 1:12:38 AM carlos.cp3 rosetta@home 386 No work from project
2/11/2006 1:42:13 AM carlos.cp3 rosetta@home 452 Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of 500
2/11/2006 1:42:13 AM carlos.cp3 rosetta@home 453 No schedulers responded
2/11/2006 2:14:10 AM crobertp.cp3 rosetta@home 223 Unrecoverable error for result BARCODE_30_1ubi__299_26452_1 (aborted by user)
2/11/2006 2:16:43 AM crobertp.cp3 rosetta@home 248 Unrecoverable error for result BARCODE_30_1c9oA_299_27305_0 (process got signal 11)
2/11/2006 2:18:26 AM crobertp.cp3 rosetta@home 252 Unrecoverable error for result BARCODE_30_1ctf__299_27362_0 (process got signal 11)
2/11/2006 2:47:18 AM crobertp.cp3 rosetta@home 284 Unrecoverable error for result BARCODE_30_5croA_299_27452_0 (aborted by user)
2/11/2006 4:06:06 AM crobertp.cp3 rosetta@home 320 Unrecoverable error for result BARCODE_30_1shfA_299_27554_0 (aborted by user)
2/11/2006 4:24:29 AM crobertp.cp3 rosetta@home 329 Unrecoverable error for result BARCODE_30_5croA_299_27741_0 (aborted by user)
2/11/2006 5:06:51 AM crobertp.cp3 rosetta@home 374 Unrecoverable error for result BARCODE_30_1elwA_299_28034_0 (aborted by user)
2/11/2006 11:19:29 AM crobertp.cp3 rosetta@home 194 Unrecoverable error for result NO_SIM_ANNEAL_BARCODE_30_1r69_240_3262_1 (process got signal 11)
2/11/2006 2:41:51 PM crobertp.cp3 rosetta@home 331 Unrecoverable error for result BARCODE_30_1opd__299_27209_1 (aborted by user)
2/11/2006 3:04:10 PM merfapet rosetta@home 15 Unrecoverable error for result BARCODE_30_256bA_299_1571_1 (aborted by user)
2/11/2006 3:40:42 PM crobertp.cp3 rosetta@home 380 Unrecoverable error for result BARCODE_30_1iibA_299_28610_0 (process got signal 24)
2/11/2006 3:53:47 PM carlos.cp3 rosetta@home 691 No work from project
2/11/2006 4:12:59 PM carlos.cp3 rosetta@home 764 Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi failed with a return value of -106

Thanks,

Click signature for global team stats
ID: 10668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 18 · Next

Message boards : Number crunching : Report stuck & aborted WU here please



©2024 University of Washington
https://www.bakerlab.org