Message boards : Number crunching : Report stuck & aborted WU here please
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 18 · Next
Author | Message |
---|---|
Rossmor35 Send message Joined: 24 Sep 05 Posts: 4 Credit: 84,870 RAC: 0 |
This Wu stuck on 20% for 5hrs.Aborted as graphics not moving and step count not moving. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=8130938 |
Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0 |
Mercyfully killed WU after ~70 hours on Pentium D; the other resultID (also Pentium D) went fine. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=8363123 |
casio7131 Send message Joined: 10 Oct 05 Posts: 35 Credit: 149,748 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=10302640 PRODUCTION_ABINITIO_CENTROID_PACKING_2ci2I_301_2380_0 was stuck at 1% after ~30 hours. i restarted boinc, and it's now at 20% after 21 min. computer is dual p3 933. |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
2/13/2006 9:59:01 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1ac/BARCODE_30_1acf__299_23614_0_0 2182 bytes != offset 0 bytes 2/13/2006 9:59:01 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1acf__299_23614_0_0: transient upload error 2/13/2006 9:59:01 AM|rosetta@home|Backing off 3 hours, 57 minutes, and 6 seconds on upload of file BARCODE_30_1acf__299_23614_0_0 2/13/2006 9:59:07 AM|rosetta@home|Started upload of BARCODE_30_1tig__299_23625_0_0 2/13/2006 9:59:10 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/371/BARCODE_30_1tig__299_23625_0_0 1948 bytes != offset 0 bytes 2/13/2006 9:59:10 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1tig__299_23625_0_0: transient upload error 2/13/2006 9:59:10 AM|rosetta@home|Backing off 3 hours, 17 minutes, and 10 seconds on upload of file BARCODE_30_1tig__299_23625_0_0 2/13/2006 9:59:18 AM|rosetta@home|Started upload of BARCODE_30_1bm8__299_23283_2_0 2/13/2006 9:59:21 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1b4/BARCODE_30_1bm8__299_23283_2_0 722 bytes != offset 0 bytes 2/13/2006 9:59:21 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1bm8__299_23283_2_0: transient upload error 2/13/2006 9:59:21 AM|rosetta@home|Backing off 39 minutes and 49 seconds on upload of file BARCODE_30_1bm8__299_23283_2_0 2/13/2006 9:59:28 AM|rosetta@home|Started upload of BARCODE_30_1tig__299_26551_0_0 2/13/2006 9:59:31 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/15b/BARCODE_30_1tig__299_26551_0_0 488 bytes != offset 0 bytes 2/13/2006 9:59:31 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1tig__299_26551_0_0: transient upload error 2/13/2006 9:59:31 AM|rosetta@home|Backing off 2 hours, 1 minutes, and 35 seconds on upload of file BARCODE_30_1tig__299_26551_0_0 2/13/2006 9:59:38 AM|rosetta@home|Started upload of BARCODE_30_4ubpA_299_26658_0_0 2/13/2006 9:59:41 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/1ac/BARCODE_30_4ubpA_299_26658_0_0 722 bytes != offset 0 bytes 2/13/2006 9:59:41 AM|rosetta@home|Temporarily failed upload of BARCODE_30_4ubpA_299_26658_0_0: transient upload error 2/13/2006 9:59:41 AM|rosetta@home|Backing off 51 minutes and 42 seconds on upload of file BARCODE_30_4ubpA_299_26658_0_0 2/13/2006 9:59:48 AM|rosetta@home|Started upload of BARCODE_30_1iibA_299_26685_0_0 2/13/2006 9:59:50 AM|rosetta@home|Error on file upload: length of file /f/boinc/projects/rosetta/upload/4f/BARCODE_30_1iibA_299_26685_0_0 722 bytes != offset 0 bytes 2/13/2006 9:59:50 AM|rosetta@home|Temporarily failed upload of BARCODE_30_1iibA_299_26685_0_0: transient upload error 2/13/2006 9:59:50 AM|rosetta@home|Backing off 3 hours, 31 minutes, and 31 seconds on upload of file BARCODE_30_1iibA_299_26685_0_0 |
arklms Send message Joined: 17 Dec 05 Posts: 7 Credit: 177,488 RAC: 0 |
FAST_ABINITIO_DEFAULT_256bA_306_1050 1 1%, 9 hours. |
stonnee Send message Joined: 3 Dec 05 Posts: 4 Credit: 31,283 RAC: 0 |
PRODUCTION_ABINITIO_1dhn__250_1151_1 WU 5694061 noticed it was around 14.5 hours and at 97.5% and then it had a client error 3 other computers running this WU all had errors I dont know if it was stuck at 1% |
Carlos_Pfitzner Send message Joined: 22 Dec 05 Posts: 71 Credit: 138,867 RAC: 0 |
Erros on my pcs, for yesterday, 14 Feb 2006 11370234 9223435 14 Feb 2006 21:34:40 UTC 14 Feb 2006 22:54:44 UTC Over Client error Downloading 0.00 0.00 11323177 9113761 14 Feb 2006 16:47:14 UTC 15 Feb 2006 0:51:59 UTC Over Client error Computing 1,218.44 2.90 11271660 9138765 14 Feb 2006 11:32:16 UTC 14 Feb 2006 11:42:49 UTC Over Client error Downloading 0.00 0.00 --- Details for error computing 11323177 Name FAST_ABINITIO_DEFAULT_1fkb__306_3546_1 Workunit 9113761 Created 14 Feb 2006 8:52:21 UTC Sent 14 Feb 2006 16:47:14 UTC Received 15 Feb 2006 0:51:59 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741819 (0xc0000005) Computer ID 118809 Report deadline 21 Feb 2006 16:47:14 UTC CPU time 1218.4375 stderr out <core_client_version>5.3.2</core_client_version> <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x00739840 write attempt to address 0x06DF3010 Exiting... No heartbeat from core client for 31 sec - exiting ***UNHANDLED EXCEPTION**** Reason: Access Violation (0xc0000005) at address 0x005005D1 read attempt to address 0x106E7154 Exiting... </stderr_txt> Validate state Invalid Claimed credit 2.9009510878005 Granted credit 0 application version 4.81 Click signature for global team stats |
Steve Shedroff Send message Joined: 7 Nov 05 Posts: 11 Credit: 250,657 RAC: 0 |
I have had a large number of downloads freeze and keep data from flowing so work has stopped. Most have the "fasta" designationin thier name. I just aborted about 20 downloads. Each took two aborts or more to actually kill them. I was getting Error 500 and error 505 messages from BOINC. Any idea what I may have set wrong that might be causing this? I saved a portion of the message log if anyone wnats to see the communication thread. Work is on a laptop that moves from connection to connection, some with Proxy and some without. I manually change proxy setting to fit location. Been running BOINC for some time now, 10,451 WU on this computer so far. This started happening this week. |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
This WU 9284726 was stuck at 1% after 10 hours. |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=10041959 ?????????????????????? |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
|
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
WU 8428792 stuck for a couple of days, under Linux, until I noticed and killed the task:
Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
|
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
PRODUCTION_ABINITIO_DBFLAGS_BARCODE10_2vik__308_1421_0 stuck at 23.08% for over a day. Aborting it. rosetta 4.79 on Mac OS X 10.3.9 |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Another possible cause is when the CPDN controlling process hadsm3_* is killed, leaving the worker process hadsm3um_* running. The Science Application (a.k.a. "worker") can only be killed using task manager or by a reboot. Not running the Boinc screensaver? Hmm then it seems likely that some part of Rosetta isn't being killed when switching and causing the error. I wonder if this is part of the problems Ralph is looking to find? I don't know much about Rosettas' processes/app. sorry tony |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
Another possible cause is when the CPDN controlling process hadsm3_* is killed, leaving the worker process hadsm3um_* running. The Science Application (a.k.a. "worker") can only be killed using task manager or by a reboot. No switching, only running R@H 24/7. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Another WU apparently crashed & stuck under Linux (2.4.27 Debian Sarge Stable), 9469195 This machine has "leave in memory"=Yes. It has been shared between 6 other BOINC projects for >1month. Only Rosetta 4.80 has problems with getting stuck, prior v4.2 (HPF/WCG) never had a problem.
Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
A quick update on WU 9469195 mentioned in prior message. I killed the Rosetta 4.80 task ($ kill <pid>) and BOINC re-run the same WU, this time successfully, to completion. Probably the only change being the random seed. The stderr.txt shown in resultid, contains the contents of the previous, unsuccessful and eventual hung, run attempt (with the previous random seed). Which I had copied here in my previous post. Btw, should I take the time to report this stuff? Is anyone looking at this? Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
cloaked_chaos Send message Joined: 9 Nov 05 Posts: 14 Credit: 80,818 RAC: 0 |
This WU took 165 hours before it finally decided that it was running for too long. I would really like to receive credit for this since it is 2,175.86 credit. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=8103040 |
Message boards :
Number crunching :
Report stuck & aborted WU here please
©2024 University of Washington
https://www.bakerlab.org