Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Ackiss Send message Joined: 13 Apr 06 Posts: 1 Credit: 607,960 RAC: 0 |
4/27/2006 7:49:18 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1a68__351_24311_2 (aborted via GUI RPC) 5.01 error. Had over 85 hours cpu time and claimed to be just over 11% done. Should've been watching more closely. I wasn't aware there was a problem until I checked the news. Ackiss |
mewbysea Send message Joined: 29 Jan 06 Posts: 17 Credit: 15,917,465 RAC: 2,183 |
Aborted this workunit ID 12454830. Two other computers failed to return this WU, or any others, so they may *still* be crunching it! See HB_BARCODE_30_1ctf_351_39751 Result ID 18270294 Stopped at 20:05 hours and 8..518% Run on an HP D530, Pentium 4 @ 2.66 GHz (stock), under WIN XP Home. |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
I thought FACONTACTS_RECENTER* were meant to be removed from the workunit queue on your end? Why are known bad units still being sent out? This one is now back in queue for the next lucky recipient. name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803_1 WU name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803 project URL: https://boinc.bakerlab.org/rosetta/ report deadline: Thu May 11 13:42:38 2006 app version num: 501 checkpoint CPU time: 38443.460000 current CPU time: 40928.280000 fraction done: 0.016453 VM usage: 0.000000 resident set size: 0.000000 estimated CPU time remaining: 67723.265742 supports graphics: no https://boinc.bakerlab.org/rosetta/result.php?resultid=18357585 |
mhhall Send message Joined: 28 Mar 06 Posts: 7 Credit: 10,193,127 RAC: 0 |
[snip] Please do not abort you current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded of these failed Work units, and the answer is yes. [snip] My current work unit has been running since Monday eve. and shows progress at 5.90% and CPU time of 148:47:21. Does this meet the criteria of "running well". I don't mind letting this process continue to run..... I'm just worried that its not getting finish anyway.... Mike Hall / Engineering Solutions, Inc. |
eNDo Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
I just realized there was a thread to even report this. I'd have to go through all my results just to see which and when. I'm new to rosetta and figured this out when all my team was having the same issues with the same WU's. Checked previous results on the WU's and no one had finished. I know it was at least 8 WU's for my server host alone. Do I need to pull up all the facts for you? -edit- https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14041906 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14925740 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14560586 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13409472 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408713 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408400 Hope I listed them correctly. Again this is one user. But I guess its their job to report theirs. The ones with the lower cpu time with abort were stuck at 1% for approx 1+hrs. to my recollection. -edit- Sincerely, F. Bulson Endonet Inc. |
Team_Elteor_Borislavj~Intelligence Send message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04% HBLR_1.0_1ogw_420_6744_2 |
anbrager Send message Joined: 21 Nov 05 Posts: 3 Credit: 535,292 RAC: 0 |
Hi, have aborted Result ID 18179436 "HBLR_1.0_1b72_ROT_TRIALS_TRIE_449_13_1" after 10,5 hours at 5%. |
Team_Elteor_Borislavj~Intelligence Send message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04% Can i get my credit for it? https://boinc.bakerlab.org/rosetta/result.php?resultid=18194769 |
Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0 |
Hi, I just aborted FA_RLXpt_hom003_1ptq__361_16_3 . It was running for more than 12 hours, while my runtime preference is default (4 hours). Also Rhiju suggested in this post that WUs with 1ptq in the title should be aborted. Why I had WU that was told to be wrong like 2 weeks ago? |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
Going nowhere, the last 5.01 on this box, 38 minutes for 1.06% and a rising completion time: 18432562 Name AB_CASP6_t216__456_5100_0 Workunit 15217912 Created 27 Apr 2006 16:50:33 UTC Sent 27 Apr 2006 21:04:16 UTC Received 28 Apr 2006 13:20:57 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 190714 Report deadline 11 May 2006 21:04:16 UTC CPU time 2293.640625 |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
Result ID 18229643 Name HBLR_1.0_1hz6_ROT_TRIALS_TRIE_449_43_2 Workunit 14630563 Created 25 Apr 2006 17:03:34 UTC Sent 25 Apr 2006 17:10:07 UTC Received 28 Apr 2006 16:24:09 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 168844 Report deadline 9 May 2006 17:10:07 UTC CPU time 101762.765625 |
K1100LTSE Send message Joined: 28 Feb 06 Posts: 7 Credit: 192,387 RAC: 0 |
|
TCU Computer Science Send message Joined: 7 Dec 05 Posts: 28 Credit: 12,861,977 RAC: 0 |
Four more 5.01 WUs were aborted this morning 50.1 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=18296499 HB_BARCODE_30_5croA_351_21027 51.9 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=18296492 HB_BARCODE_30_1a19A_351_28780_3 53.0 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=18296362 HBLR_1.0_1dtj_ROT_TRIALS_TRIE_449_27 89.6 hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=18037119 FA_RLXfn_hom001_1fna__357_63 |
XS_DDT's_Cattle_Prods Send message Joined: 24 Mar 06 Posts: 12 Credit: 1,180,072 RAC: 0 |
So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time? |
[DPC]Division_Brabant~OldButNotSoWise Send message Joined: 23 Jan 06 Posts: 42 Credit: 371,797 RAC: 0 |
So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time? Checked that with my error results. I think that's indeed what happens, this one has crunched for over 5 days. Result ID 17773392 Name HBLR_1.0_2tif_420_9913_1 Workunit 13429467 Created 20 Apr 2006 21:56:28 UTC Sent 21 Apr 2006 4:39:59 UTC Received 26 Apr 2006 19:26:25 UTC Server state Over Outcome Client error Client state Computing Exit status -177 (0xffffff4f) Computer ID 147219 Report deadline 5 May 2006 4:39:59 UTC CPU time 369442.734375 stderr out <core_client_version>5.3.12.tx36</core_client_version> <message>Maximum CPU time exceeded </message> <stderr_txt> # random seed: 1574792 # cpu_run_time_pref: 7200 </stderr_txt> Validate state Invalid Claimed credit 1792.64206533905 Granted credit 300 application version 5.01 |
Hassan Send message Joined: 7 Mar 06 Posts: 4 Credit: 750,146 RAC: 0 |
So, are all of the aborted and stuck WUs being granted 300 points, no matter the computation time? Result ID 18122796 Name HBLR_1.0_1dtj_420_3452_3 Workunit 13389346 Created 24 Apr 2006 15:15:38 UTC Sent 24 Apr 2006 20:33:06 UTC Received 27 Apr 2006 5:24:19 UTC Server state Over Outcome Client error Client state Computing Exit status -177 (0xffffff4f) Computer ID 175797 Report deadline 8 May 2006 20:33:06 UTC CPU time 130784.75 stderr out <core_client_version>5.2.13</core_client_version> <message>Maximum CPU time exceeded </message> <stderr_txt> # random seed: 1597253 # cpu_run_time_pref: 7200 # random seed: 1597253 # random seed: 1597253 </stderr_txt> Validate state Invalid Claimed credit 1165.54456988046 Granted credit 300 application version 5.01 |
XS_lv_dicedealer Send message Joined: 3 Jan 06 Posts: 16 Credit: 1,761,309 RAC: 0 |
Here is another stuck 5.01 WU https://boinc.bakerlab.org/rosetta/result.php?resultid=18392691 I have since exorcised my farm of the 5.01s and the 5.06s... this one slipped by me though. Thanks for all you hard work at getting these snags worked out, the R@H team deserves a pat on the back for trying to get this fixed so quickly for us crunchers. |
belldandy from pleiades Send message Joined: 2 Nov 05 Posts: 6 Credit: 102,731 RAC: 0 |
2 WUs that I aborted because it takes wayyyy to much time, they didn't hang though. https://boinc.bakerlab.org/rosetta/result.php?resultid=17827510 FACONTACTS_NOFILTERS_1r69__441_248_1 https://boinc.bakerlab.org/rosetta/result.php?resultid=17773776 HBLR_1.0_2tif_420_9927_1 Campeones everywhere! |
[DPC]Alexcj Send message Joined: 21 Mar 06 Posts: 3 Credit: 8,374 RAC: 0 |
I have also a WU that is taking WAY to long to complete. It is progressing though, I would like to see it finished. It's HB_BARCODE_30_1bm8__351_34196_3 allthough I think it is not going to finish in time. Is it helpfull for the project to have it progress as much as possible ? |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Is it helpfull for the project to have it progress as much as possible ? Alex, in general, yes, it's helpful. In your case other work units are running 10,000 seconds and that one looks like you aborted after 443,000 seconds! 5+ days! Unless you changed your preference to be 4 days... you were more than patient with that one. I see it was crunched on release 5.01. The newer release has the "watchdog" and it should find WUs such as this and end them much sooner, thus saving you those days of wondering. So, you did the right things here. You were patient and didn't end it in a sudden panic after 2 hrs and 1 minute, you reported it here, and you're now crunching more WUs. And I believe you will find the current release has resolved problems like this as well, so it shouldn't happen again. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Message boards :
Number crunching :
Report stuck & aborted 5.01 WU here please - III
©2025 University of Washington
https://www.bakerlab.org