Message boards : Number crunching : Report stuck & aborted 5.01 WU here please - III
Author | Message |
---|---|
borkescomputers Send message Joined: 3 Apr 06 Posts: 2 Credit: 5,910,922 RAC: 0 |
21-4-2006 8:33:35|rosetta@home|Unrecoverable error for result HBLR_1.0_1di2_420_5587_1 (aborted via GUI RPC) why: cause the job runned for like 5 hours, i looked at the gui and i saw i ran in a loop, after the atom relax stage it turned in the ab initio stage, same model number, same %. https://boinc.bakerlab.org/rosetta/result.php?resultid=17754739 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
I have 4 long runners running here Norm run time is 2 Hrs +/- 4/21/2006 2:01:44 AM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1bgf__448_895_1 using rosetta version 501 1.52% @6hr 4/20/2006 11:29:01 PM|rosetta@home|Starting result FACONTACTS_RECENTER_NOFILTERS_1r69__448_100_1 using rosetta version 501 2.44% @9.2hr 4/21/2006 2:49:55 AM|rosetta@home|Starting result HBLR_1.0_1hz6_420_4593_1 using rosetta version 501 3.68% @ 6.1 hr 4/21/2006 2:33:19 AM|rosetta@home|Starting result HBLR_1.0_1mky_420_5056_1 using rosetta version 501==4.11% @ 6.7hr I am going to work I will check on them again later today to see what is up with them If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
pb Send message Joined: 30 Nov 05 Posts: 6 Credit: 65,632 RAC: 0 |
AS the older threads were getting very long I have started a new thread. The old thread can be found here. This thread is for reporting errors related to Version 5.01 or greater. Please provide a link to the result id in your report. I had 2 long runners, which I aborted: https://boinc.bakerlab.org/rosetta/result.php?resultid=17777186 https://boinc.bakerlab.org/rosetta/result.php?resultid=17555605 [this one was restarting over and over again, so i've aborted it after a restart so it have 129 seconds, but in fact it was about 10 hours] Will i get a credit for the first one? PS. here is one more: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=14105384 |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
...I had 2 long runners, which I aborted: Please see this post. Moderator9 ROSETTA@home FAQ Moderator Contact |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
I have 4 long runners running here Norm run time is 2 Hrs +/- Well back after 7 Hr and they are all still running all up 0.05% at this rate it will take over 200 Hrs to complete I am Aborting all with about 12 to 15 Hrs I will try to post a link later after the UL If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
biodoc Send message Joined: 19 Feb 06 Posts: 14 Credit: 30,717,792 RAC: 0 |
I just abort 2 "long runners" (17+ hours; about 2% complete) Result ID 17758781 Name FACONTACTS_RECENTER_NOFILTERS_1scjB_448_993_1 Result ID 17757908 Name FACONTACTS_RECENTER_NOFILTERS_1tit__448_560_1 |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
I'm stopping the sendout of these jobs. For some machines, there appears to be an incompatibility of 5.01 with these jobs. Please abort FACONTACTS_RECENTER jobs and HBLR1.0 jobs. I just abort 2 "long runners" (17+ hours; about 2% complete) |
K1100LTSE Send message Joined: 28 Feb 06 Posts: 7 Credit: 192,387 RAC: 0 |
I just abort 2 "long runners" 1.FACONTACTS_RECENTER_NOFILTERS_1cei__448_30_1 7.35 hour https://boinc.bakerlab.org/rosetta/result.php?resultid=17753170 2.PROD_ABINITIO_7STRANDBAR_1tul__447_35477_0 5.05 hour https://boinc.bakerlab.org/rosetta/result.php?resultid=17757187 |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
FACONTACTS_RECENTER_NOFILTERS_1vls__448_282_1 <core_client_version>5.3.12.tx36</core_client_version> <message>aborted by user </message> <stderr_txt> # random seed: 2488779 # random seed: 2488779 # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting # random seed: 2488779 </stderr_txt> Aborted after 12 hours on a 2600xp and was at 1.8 percent. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Actually the second result returned a lot of data! Cool. You shouldn't get any more of the first kind of workunit -- please abort any you have in your queue, and post here if you get more. Thanks! I just abort 2 "long runners" |
[B^S] Gamma^Ray Send message Joined: 20 Apr 06 Posts: 12 Credit: 21,284 RAC: 0 |
Just aborted a "Long Runner" after 8:17:39 Cpu at around 1.8% Complete. FACONTACTS_RECENTER_NOFILTERS_1elwA_448_330_1 Result ID 17832302 Workunit 14535551 |
XS_DDT's_Cattle_Prods Send message Joined: 24 Mar 06 Posts: 12 Credit: 1,180,072 RAC: 0 |
OK, I'm running Ubuntu 64-bit, and I've got a HBLR_1.0_1mky_420_6443_1 that's been running for 16:56 with 14.69% done, and a FACONTACTS_RECENTER_NOFILTERS that's been running for 10:59 with 3.49% done. Should I abort them, or will they finish sometime soon? This is node # 207728 |
Delk Send message Joined: 20 Feb 06 Posts: 25 Credit: 995,624 RAC: 0 |
name: FACONTACTS_RECENTER_NOFILTERS_1gvp__448_830_1 WU name: FACONTACTS_RECENTER_NOFILTERS_1gvp__448_830 project URL: https://boinc.bakerlab.org/rosetta/ report deadline: Fri May 5 10:23:27 2006 stderr_out: app version num: 501 checkpoint CPU time: 72062.600000 current CPU time: 72824.870000 fraction done: 0.024063 VM usage: 0.000000 resident set size: 0.000000 estimated CPU time remaining: 98151.432619 https://boinc.bakerlab.org/rosetta/result.php?resultid=17756235 Manually aborted, meant to be a 8hr workunit obviously stuck, noticed this was a recycled unit someone else aborted previously. |
Laurenu2 Send message Joined: 6 Nov 05 Posts: 57 Credit: 3,818,778 RAC: 0 |
Ok here is a Small list of all the LONG Runners I Abortd Today I am sure it toped over 200 Hrs https://boinc.bakerlab.org/rosetta/result.php?resultid=17751455 https://boinc.bakerlab.org/rosetta/result.php?resultid=17768645 https://boinc.bakerlab.org/rosetta/result.php?resultid=17586899 https://boinc.bakerlab.org/rosetta/result.php?resultid=16971548 https://boinc.bakerlab.org/rosetta/result.php?resultid=17752270 https://boinc.bakerlab.org/rosetta/result.php?resultid=17751466 https://boinc.bakerlab.org/rosetta/result.php?resultid=17786078 https://boinc.bakerlab.org/rosetta/result.php?resultid=17757931 https://boinc.bakerlab.org/rosetta/result.php?resultid=17774722 https://boinc.bakerlab.org/rosetta/result.php?resultid=17754712 https://boinc.bakerlab.org/rosetta/result.php?resultid=17760449 https://boinc.bakerlab.org/rosetta/result.php?resultid=17748955 https://boinc.bakerlab.org/rosetta/result.php?resultid=17763107 https://boinc.bakerlab.org/rosetta/result.php?resultid=17764880 https://boinc.bakerlab.org/rosetta/result.php?resultid=17757632 https://boinc.bakerlab.org/rosetta/result.php?resultid=17721400 https://boinc.bakerlab.org/rosetta/result.php?resultid=17764855 https://boinc.bakerlab.org/rosetta/result.php?resultid=17755473 If You Want The Best You Must forget The Rest ---------------And Join Free-DC---------------- |
Tallguy-13088 Send message Joined: 14 Dec 05 Posts: 9 Credit: 843,378 RAC: 0 |
Folks, Just as an FYI at this point, I have two "long runners" on two seperate machines. The first is HBLR_1.0_1djt_420_4640_1 running on a Dual Xeon P4 @ 2.8Ghz. The numbers are CPU_Time: 18.49 Hrs, Complete: 7.68% and To Completion: 20.12 Hrs. The second is HBLR_1.0_1n0u_420_9492_1 running on a 3.2 Ghz P4 with CPU_Time: 19.26 Hrs, Complete: 7.28% and To Completion: 20.45. Both are version 5.01 The graphics seem to be updating and both work units are apparently making forward progress (tons of data points). Both appear to swapping between 2+ seconds per step and then the normal "fast" multiple steps per second. Obviously the green data points are the "fast steppers". I plan on running these up to about 50 hours apiece before aborting them (if they appear to require significantly more time). My goal is to give you guys as much diagnostic info as possible which translates into run-time. |
XS_DDT's_Cattle_Prods Send message Joined: 24 Mar 06 Posts: 12 Credit: 1,180,072 RAC: 0 |
OK, I'm running Ubuntu 64-bit, and I've got a HBLR_1.0_1mky_420_6443_1 that's been running for 16:56 with 14.69% done, and a FACONTACTS_RECENTER_NOFILTERS_1who_448_991_1 that's been running for 10:59 with 3.49% done. Should I abort them, or will they finish sometime soon? OK, these are still going, and I can't check graphics in linux, so I'm aborting. FACONTACTS_RECENTER_NOFILTERS_1who_448_991_1 & HBLR_1.0_1mky_420_6443_1. HBLR ran for 24:09:13 and FACONTACTS was at 18:12:00. https://boinc.bakerlab.org/rosetta/result.php?resultid=17758649 - this is the HBLR https://boinc.bakerlab.org/rosetta/result.php?resultid=17758516 - This is the FACONTACTS |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
Two long-runners here, both churned from previous machines: HBLR_1.0_1hz6_420_9949 previously aborted on a PC and a Mac, 3h 40m for 4% so far, graphics normal. NO_TERM_STRAND_1ogw_423_8642 previously time limit expired, 4h 45m for 4.64%, graphics very slow. I'll let 'em run if they stay running. Edit: The Turing Halting Problem made real. Smart chap, A. Turing. |
Rebel Alliance Send message Joined: 4 Nov 05 Posts: 50 Credit: 3,579,531 RAC: 0 |
And another one the same 2600xp Name FACONTACTS_RECENTER_NOFILTERS_1ail__448_912_1 Workunit 14575111 Funny thing is there are 2 running on my opty 165 right now and they seem to be doing just fine. |
Grutte Pier [Wa Oars]~Ytsmabeer Send message Joined: 10 Nov 05 Posts: 2 Credit: 100,205 RAC: 0 |
When I come back of dinner Iam going to Abort a long runner 17 hours 14%. Would be nice to get points. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=13422021 |
Message boards :
Number crunching :
Report stuck & aborted 5.01 WU here please - III
©2024 University of Washington
https://www.bakerlab.org