Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Hassan Send message Joined: 7 Mar 06 Posts: 4 Credit: 750,146 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=18122796 2006-04-26 21:35:57 [rosetta@home] Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 2006-04-26 22:23:18 [rosetta@home] Aborting result HBLR_1.0_1dtj_420_3452_3: exceeded CPU time limit 130783.200508 2006-04-26 22:23:18 [rosetta@home] Unrecoverable error for result HBLR_1.0_1dtj_420_3452_3 (Maximum CPU time exceeded) 2006-04-26 22:23:19 [---] request_reschedule_cpus: process exited 2006-04-26 22:23:19 [rosetta@home] Computation for result HBLR_1.0_1dtj_420_3452_3 finished 2006-04-26 22:23:19 [rosetta@home] Starting result PROD_ABINITIO_ALPHABETABAR_1tul__447_85936_0 using rosetta version 501 Mine it appears auto-aborted, thats almost 1200 claimed credit 0 granted, so is that a waste or do I get still get credit. Result ID 18122796 Name HBLR_1.0_1dtj_420_3452_3 Workunit 13389346 Created 24 Apr 2006 15:15:38 UTC Sent 24 Apr 2006 20:33:06 UTC Received 27 Apr 2006 5:24:19 UTC Server state Over Outcome Client error Client state Computing Exit status -177 (0xffffff4f) Computer ID 175797 Report deadline 8 May 2006 20:33:06 UTC CPU time 130784.75 stderr out <core_client_version>5.2.13</core_client_version> <message>Maximum CPU time exceeded </message> <stderr_txt> # random seed: 1597253 # cpu_run_time_pref: 7200 # random seed: 1597253 # random seed: 1597253 </stderr_txt> Validate state Invalid Claimed credit 1165.54456988046 Granted credit 0 application version 5.01 |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=18122796 That's an amazing WU. It was first send out April 6th and after the deadline was reached without a result it was three more times sent out and all three times failed. It was then VALID returned from the first host with a reported runtime from only 6800 seconds and with app 4.83. Whoppa! |
[DPC]Division_Brabant~OldButNotSoWise Send message Joined: 23 Jan 06 Posts: 42 Credit: 371,797 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=18190274 https://boinc.bakerlab.org/rosetta/result.php?resultid=18190275 both run more then 15 hours and claimed to be at 5 % I learned my lesson after a job crunched for more then 5 days and end with a CPU time exceeded error, so I aborted the jobs. This happens more and more, very anoying. |
[DPC]FOKschaap~_mcintosh_ Send message Joined: 4 Dec 05 Posts: 5 Credit: 118,303 RAC: 0 |
AB_CASP6_t216__458_807 stuck at 1% aborted AB_CASP6_t216__456_2307 stuck at 1%, but it seems that the job is succesfully crunched by another user. This is strange because i have a much faster CPU than the one it was completed with, and he finished in 9,834.69 sec. and mine was still at 1% after 5,698.12 |
Ackiss Send message Joined: 13 Apr 06 Posts: 1 Credit: 607,960 RAC: 0 |
4/27/2006 7:49:18 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1a68__351_24311_2 (aborted via GUI RPC) 5.01 error. Had over 85 hours cpu time and claimed to be just over 11% done. Should've been watching more closely. I wasn't aware there was a problem until I checked the news. Ackiss |
mewbysea Send message Joined: 29 Jan 06 Posts: 17 Credit: 15,843,832 RAC: 1,618 |
Aborted this workunit ID 12454830. Two other computers failed to return this WU, or any others, so they may *still* be crunching it! See HB_BARCODE_30_1ctf_351_39751 Result ID 18270294 Stopped at 20:05 hours and 8..518% Run on an HP D530, Pentium 4 @ 2.66 GHz (stock), under WIN XP Home. |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
I thought FACONTACTS_RECENTER* were meant to be removed from the workunit queue on your end? Why are known bad units still being sent out? This one is now back in queue for the next lucky recipient. name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803_1 WU name: FACONTACTS_RECENTER_NOFILTERS_1rnbA_448_803 project URL: https://boinc.bakerlab.org/rosetta/ report deadline: Thu May 11 13:42:38 2006 app version num: 501 checkpoint CPU time: 38443.460000 current CPU time: 40928.280000 fraction done: 0.016453 VM usage: 0.000000 resident set size: 0.000000 estimated CPU time remaining: 67723.265742 supports graphics: no https://boinc.bakerlab.org/rosetta/result.php?resultid=18357585 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Help has arrived. Version 5.06 has been released, This should put an end to the hangs and long run times that have been a problem to many of you reporting in this thread. Please do not abort your current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded for these failed Work units, and the answer is yes. The credit is awarded on Fridays for failed Work Units. For errors resulting from Rosetta Version 5.01, continue to report here. For errors relating to Rosetta Version 5.06 report here For information on the new Version and what it is supposed to do see this post. For a message from Dr. Baker about the work unit runs in preparation for CASP see his journal entry here. Moderator9 ROSETTA@home FAQ Moderator Contact |
mhhall Send message Joined: 28 Mar 06 Posts: 7 Credit: 10,193,127 RAC: 4 |
[snip] Please do not abort you current 5.01 Work Units if they are runing well. The science from them is still important to the project. Many of you have asked if credit will be awarded of these failed Work units, and the answer is yes. [snip] My current work unit has been running since Monday eve. and shows progress at 5.90% and CPU time of 148:47:21. Does this meet the criteria of "running well". I don't mind letting this process continue to run..... I'm just worried that its not getting finish anyway.... Mike Hall / Engineering Solutions, Inc. |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
[snip] If it has been running longer than four (4) times your run time preference setting then abort it. Moderator9 ROSETTA@home FAQ Moderator Contact |
eNDo Send message Joined: 9 Apr 06 Posts: 9 Credit: 372,288 RAC: 0 |
I just realized there was a thread to even report this. I'd have to go through all my results just to see which and when. I'm new to rosetta and figured this out when all my team was having the same issues with the same WU's. Checked previous results on the WU's and no one had finished. I know it was at least 8 WU's for my server host alone. Do I need to pull up all the facts for you? -edit- https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14041906 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14925740 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14560586 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14226095 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13409472 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408713 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13408400 Hope I listed them correctly. Again this is one user. But I guess its their job to report theirs. The ones with the lower cpu time with abort were stuck at 1% for approx 1+hrs. to my recollection. -edit- Sincerely, F. Bulson Endonet Inc. |
Team_Elteor_Borislavj~Intelligence Send message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04% HBLR_1.0_1ogw_420_6744_2 |
anbrager Send message Joined: 21 Nov 05 Posts: 3 Credit: 535,292 RAC: 0 |
Hi, have aborted Result ID 18179436 "HBLR_1.0_1b72_ROT_TRIALS_TRIE_449_13_1" after 10,5 hours at 5%. |
Team_Elteor_Borislavj~Intelligence Send message Joined: 7 Dec 05 Posts: 14 Credit: 56,027 RAC: 0 |
I lost 44 hours of cpu time, i'm aborting it now it hung @ 1.04% Can i get my credit for it? https://boinc.bakerlab.org/rosetta/result.php?resultid=18194769 |
Aglarond Send message Joined: 29 Jan 06 Posts: 26 Credit: 446,212 RAC: 0 |
Hi, I just aborted FA_RLXpt_hom003_1ptq__361_16_3 . It was running for more than 12 hours, while my runtime preference is default (4 hours). Also Rhiju suggested in this post that WUs with 1ptq in the title should be aborted. Why I had WU that was told to be wrong like 2 weeks ago? |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Help has arrived. Version 5.06 has been released, This should put an end to the hangs and long run times that have been a problem to many of you reporting in this thread. Please do not abort your current 5.01 Work Units if they are runing well. The science from them is still important to the project. If a Work Unit has run longer than 4 times your preferences "Time" setting, then it should be aborted. Many of you have asked if credit will be awarded for these failed Work units, and the answer is yes. The credit is awarded on Fridays for failed Work Units. For errors resulting from Rosetta Version 5.01, continue to report here. For errors relating to Rosetta Version 5.06 report here For information on the new Version and what it is supposed to do see this post. For a message from Dr. Baker about the work unit runs in preparation for CASP see his journal entry here. Moderator9 ROSETTA@home FAQ Moderator Contact |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
Going nowhere, the last 5.01 on this box, 38 minutes for 1.06% and a rising completion time: 18432562 Name AB_CASP6_t216__456_5100_0 Workunit 15217912 Created 27 Apr 2006 16:50:33 UTC Sent 27 Apr 2006 21:04:16 UTC Received 28 Apr 2006 13:20:57 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 190714 Report deadline 11 May 2006 21:04:16 UTC CPU time 2293.640625 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Going nowhere, the last 5.01 on this box, 38 minutes for 1.06% and a rising completion time: It is normal for the completion time to rise between checkpoints Moderator9 ROSETTA@home FAQ Moderator Contact |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
Result ID 18229643 Name HBLR_1.0_1hz6_ROT_TRIALS_TRIE_449_43_2 Workunit 14630563 Created 25 Apr 2006 17:03:34 UTC Sent 25 Apr 2006 17:10:07 UTC Received 28 Apr 2006 16:24:09 UTC Server state Over Outcome Client error Client state Computing Exit status -197 (0xffffff3b) Computer ID 168844 Report deadline 9 May 2006 17:10:07 UTC CPU time 101762.765625 |
K1100LTSE Send message Joined: 28 Feb 06 Posts: 7 Credit: 192,387 RAC: 0 |
|
Message boards :
Number crunching :
Report stuck & aborted 5.01 WU here please - III
©2024 University of Washington
https://www.bakerlab.org