Message boards : Number crunching : Report stuck & aborted WU here please
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 17 · Next
Author | Message |
---|---|
TheSwampDweller Send message Joined: 17 Jan 06 Posts: 2 Credit: 16,412 RAC: 0 |
HOMSdt_homDB011_1dtj__340_108 Failed after 51 seconds on my cpu and after 30 seconds on another user's cpu. |
Jakob Paikin Send message Joined: 4 Oct 05 Posts: 1 Credit: 3,878 RAC: 0 |
I just noticed this in my Boinc (5.2.13/Windows XP) log: 08/03/2006 10:15:28|rosetta@home|Unrecoverable error for result HOMSdt_homDB004_1dtj__352_542_1 (Forkert funktion. (0x1) - exit code 1 (0x1)) The text in () is partially Danish and means "(Wrong function (0x1) ..." or "(Incorrect function (0x1) ..." |
Zazie Send message Joined: 1 Mar 06 Posts: 2 Credit: 159,032 RAC: 0 |
Hi, I just found out that out of 19 recently processed Rosetta WUs, 8 finished with an Unrecoverable error. Either they had the exit code -164 (0xffffff5c), that was for WUs SHORTRELAX_1cg5B_333_14_0, ABINITac_hom018_2acy__337_5_0, HOMSti_homDB004_1tif__346_112_0, HBLR_1.0_1dcj_348_716_0. Or they had the exit code -1073741819 (0xc0000005), which happened for results ABINITsc_hom016_1scjB_322_87_1, HOMSdc_homDB003_1dcj__339_86_0, ABINITpt_hom003_1ptq__322_94_1, HBLR_1.0_2tif_348_5539_0. This has ALWAYS happened while removing from the memory and proceeding to other project, so I suppose this should really be remedied by setting Leave in memory when preempted to YES. But I must say I'm pretty upset by having wasted such loads of processing time - and 165 worth of credit - because of some kind of an internal error of Rosetta's. No other BOINC projects are having these problems on my computer. |
OhioDude Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,056,499 RAC: 0 |
Stuck at 1% for 67 hours: HB_BARCODE_30_1scjB_347_827_0 Visit my websites honoring some of America's heroes: USS Rich DE-695 USS Bunch DE-694 / APD-79 |
Chilcotin Send message Joined: 5 Nov 05 Posts: 15 Credit: 16,969,500 RAC: 0 |
WU 9990980 stuck at 1% and aborted WU 10012688 stuck at 1% and aborted |
Nite Owl Send message Joined: 2 Nov 05 Posts: 87 Credit: 3,019,449 RAC: 0 |
Now this one hurt...<bulging eyeballs here> I noticed that the "Percent Done" in BOINCView for WU HOMShz_homDB021_1hz6A_341_126_0 was stuck at 46.1% and the time "To Complete" was growing rather than decreasing. Upon going to the affected machine and after repeatedly attempting to view the graphics and being unable to, I aborted the WU. I sure hate to lose that much time (72.08 hours)! Result ID 12580084 |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
Here is another one 3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1)) |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
Here is another one 3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1)) |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
Here is another one 3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1)) |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
That was wierd poested the same message three times never had that happen before. |
Grutte Pier [Wa Oars]~MAB The Frisian Send message Joined: 6 Nov 05 Posts: 87 Credit: 497,588 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=12412131 Couldn't find an explanation in Wiki but I may have overlooked it. |
Hell-Mood Send message Joined: 12 Feb 06 Posts: 1 Credit: 1,803 RAC: 0 |
i suspended work, so i could hibernate my Windoze. after waking up my pc and hitting "always work" in the boinc manager(v 4.45), it said calculation error on the rosetta job. maybe the log can help to locate and remove that bug. by the way: it's a great job, doin' research on proteins. 09.03.2006 00:44:15|rosetta@home|Starting result HBLR_1.0_1mky_332_3924_2 using rosetta version 4.82 09.03.2006 00:47:39|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 09.03.2006 00:47:39|rosetta@home|Requesting 0 seconds of work, returning 1 results 09.03.2006 00:47:40|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 09.03.2006 01:32:52||Suspending computation and network activity - user request 09.03.2006 01:32:52|rosetta@home|Pausing result HBLR_1.0_1mky_332_3924_2 (removed from memory) 09.03.2006 01:32:53|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_332_3924_2 ( - exit code -1073741819 (0xc0000005)) 09.03.2006 01:32:53||request_reschedule_cpus: process exited 10.03.2006 14:27:30||Resuming computation and network activity 10.03.2006 14:27:30||request_reschedule_cpus: Resuming activities 10.03.2006 14:27:30|rosetta@home|Computation for result HBLR_1.0_1mky_332_3924_2 finished |
Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0 |
After 27 seconds: Unrecoverable error for result HOMSdt_homDB027_1dtj__352_1825_1 (Incorrect function. (0x1) - exit code 1 (0x1)) On this page I saw the job had already an error at another computer https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10447424 (atm I haven't uploaded mine, I'll do this in a few hours) Member of Dutch Power Cows |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
In the last couple of days I've seen several hangs on my system. Too bad they don't automatically abort--I have to abort them manually. The bad part is they've sat there for a day or so taking up a slot, but not doing anything, until I abort them. dag 2006-03-10 13:05:17 [rosetta@home] Unrecoverable error for result HOMSog_homDB015_1ogw__352_1003_0 (process exited with code 131 (0x83)) 2006-03-10 13:05:20 [rosetta@home] Unrecoverable error for result HOMSn0_homDB004_1n0u__352_1003_0 (process exited with code 131 (0x83)) 2006-03-10 19:31:16 [rosetta@home] Unrecoverable error for result HOMSdt_homDB009_1dtj__352_1783_2 (process exited with code 1 (0x1)) 2006-03-12 10:35:27 [rosetta@home] Unrecoverable error for result HOMSti_homDB025_1tif__352_1208_1 (process exited with code 1 (0x1)) 2006-03-12 20:55:21 [rosetta@home] Unrecoverable error for result HOMSdt_homDB003_1dtj__352_1942_2 (process exited with code 1 (0x1)) 2006-03-13 02:18:27 [rosetta@home] Unrecoverable error for result HOMSdt_homDB009_1dtj__352_992_2 (process exited with code 1 (0x1)) 2006-03-13 11:27:59 [rosetta@home] Unrecoverable error for result HOMSn0_homDB017_1n0u__352_1447_0 (aborted by user) 2006-03-13 15:33:04 [rosetta@home] Unrecoverable error for result FA_RLXce_hom004_1cei__360_79_0 (process got signal 11) 2006-03-13 15:33:08 [rosetta@home] Unrecoverable error for result FA_RLXbg_hom001_1bgf__359_105_0 (process got signal 11) 2006-03-13 18:18:25 [rosetta@home] Unrecoverable error for result FA_RLXb3_hom012_1b3aA_359_81_0 (aborted by user) dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0 |
I'm having major problems with the WU's that start with FA_RLX. They are either getting stuck at 1% or they run until 90 - 95% and error. are these failing with much higher frequency than other WU on your computer? |
arklms Send message Joined: 17 Dec 05 Posts: 7 Credit: 177,488 RAC: 0 |
SSFEATURES_BARCODE_ABINITIO_5croA_334_286_0 9 hours 1%. This seems to happen a lot on this P3. Can't get it to run via command line either, the window just closes. Looks like I'll have to abort it. |
Team TMR Send message Joined: 2 Nov 05 Posts: 21 Credit: 1,583,679 RAC: 0 |
Had 3 today that have been stuck on 1% after anything between 3-16 hours (runtime set to 2 hours). 3.1 hours: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11019262 4.3 hours: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11008654 16.9 hours: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=10949277 All were aborted. |
hob. Send message Joined: 4 Nov 05 Posts: 64 Credit: 250,683 RAC: 0 |
3/4/2006 6:51:19 AM|rosetta@home|Starting result ABINITli_hom010_1lis__322_14_1 using rosetta version 482 this job has been running for over 10 12 days now.........it's been on 84.89% for at least 24 hrs now.......maybe a lot longer ?? its still using cpu power so i assume it's doing something ?? runtime is listed as 256 12 hours so far...and still counting up any advice as to what to do with it would be welcome. 46 years dc so far join team FaDbeens join us |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
3/4/2006 6:51:19 AM|rosetta@home|Starting result ABINITli_hom010_1lis__322_14_1 using rosetta version 482 You could try stopping boinc, waiting a minute, then restarting boinc. That could get the WU going again. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I just had a WU stuck at 23%. It was using CPU but wasn't making any progress. I stopped and restarted boinc and it finished normally. WU: FA_RLXbq_hom006_1bq9A_359_221_0 Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=13655767 |
Message boards :
Number crunching :
Report stuck & aborted WU here please
©2025 University of Washington
https://www.bakerlab.org