Report stuck & aborted WU here please

Message boards : Number crunching : Report stuck & aborted WU here please

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 18 · Next

AuthorMessage
nmelhorn

Send message
Joined: 16 Oct 05
Posts: 1
Credit: 177,616
RAC: 0
Message 11675 - Posted: 5 Mar 2006, 5:25:34 UTC

The following WU assigned to me:

ResultID 12020507 WUID 9636787
Sent 26 Feb 2006 14:21:21 UTC

still shows In Progress / Unknown / New on my Results page, though there's no record left in my machine. The adjacent WU's failed.

I assume I should notify here, so the WU can be quickly reassigned elsewhere.

--regards, Nate

ID: 11675 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11679 - Posted: 5 Mar 2006, 13:12:26 UTC

Stuck at 1% for 15 hours:

ABINITvc_home007_1vcc_337_21_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11679 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stu D.
Avatar

Send message
Joined: 3 Mar 06
Posts: 8
Credit: 575,867
RAC: 0
Message 11680 - Posted: 5 Mar 2006, 13:36:20 UTC

3/4/2006 11:08:23 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB009_1dtj__340_108_0 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 11680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 3
Credit: 5,487,674
RAC: 921
Message 11681 - Posted: 5 Mar 2006, 14:08:57 UTC

ABINITwi_hom007_1wit__337_79_0 is stuck at 1% after 11 hours.

The system is an Athlon XP 2200+ running Windows 2000 with version 5.2.13 of the BOINC client.
ID: 11681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11682 - Posted: 5 Mar 2006, 14:10:41 UTC - in response to Message 11679.  
Last modified: 5 Mar 2006, 14:11:12 UTC

Stuck at 1% for 15 hours:

ABINITvc_home007_1vcc_337_21_0


Got another one stuck at 1%:

HB_BARCODE_30_1acf_347_958_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11683 - Posted: 5 Mar 2006, 14:43:08 UTC - in response to Message 11682.  

And one more:

ABINITvi_hom020_2vik_337_83_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ib Rasmussen

Send message
Joined: 27 Sep 05
Posts: 16
Credit: 211,416
RAC: 0
Message 11704 - Posted: 6 Mar 2006, 8:05:03 UTC

SSFEATURES_BARCODE_ABINITIO_1acf__334_321_0 was stuck at 1% for 57+ hours. I tried stopping and restarting Boinc, but it restarted the wu at 00:00:00, so I killed it.

/Ib
ID: 11704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11711 - Posted: 6 Mar 2006, 13:03:41 UTC - in response to Message 11704.  
Last modified: 6 Mar 2006, 13:06:14 UTC

SSFEATURES_BARCODE_ABINITIO_1acf__334_321_0 was stuck at 1% for 57+ hours. I tried stopping and restarting Boinc, but it restarted the wu at 00:00:00, so I killed it.

/Ib


It is normal for a stuck WU to restart at zero.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TheSwampDweller

Send message
Joined: 17 Jan 06
Posts: 2
Credit: 16,412
RAC: 0
Message 11712 - Posted: 6 Mar 2006, 13:04:17 UTC

HOMSdt_homDB011_1dtj__340_108 Failed after 51 seconds on my cpu and after 30 seconds on another user's cpu.
ID: 11712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jakob Paikin

Send message
Joined: 4 Oct 05
Posts: 1
Credit: 3,878
RAC: 0
Message 11776 - Posted: 8 Mar 2006, 9:29:12 UTC

I just noticed this in my Boinc (5.2.13/Windows XP) log:

08/03/2006 10:15:28|rosetta@home|Unrecoverable error for result HOMSdt_homDB004_1dtj__352_542_1 (Forkert funktion. (0x1) - exit code 1 (0x1))

The text in () is partially Danish and means "(Wrong function (0x1) ..." or "(Incorrect function (0x1) ..."
ID: 11776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Zazie

Send message
Joined: 1 Mar 06
Posts: 2
Credit: 159,032
RAC: 0
Message 11777 - Posted: 8 Mar 2006, 12:29:10 UTC
Last modified: 8 Mar 2006, 12:39:26 UTC

Hi, I just found out that out of 19 recently processed Rosetta WUs, 8 finished with an Unrecoverable error. Either they had the exit code -164 (0xffffff5c), that was for WUs

SHORTRELAX_1cg5B_333_14_0, ABINITac_hom018_2acy__337_5_0, HOMSti_homDB004_1tif__346_112_0, HBLR_1.0_1dcj_348_716_0.

Or they had the exit code -1073741819 (0xc0000005), which happened for results

ABINITsc_hom016_1scjB_322_87_1, HOMSdc_homDB003_1dcj__339_86_0, ABINITpt_hom003_1ptq__322_94_1, HBLR_1.0_2tif_348_5539_0.

This has ALWAYS happened while removing from the memory and proceeding to other project, so I suppose this should really be remedied by setting Leave in memory when preempted to YES. But I must say I'm pretty upset by having wasted such loads of processing time - and 165 worth of credit - because of some kind of an internal error of Rosetta's. No other BOINC projects are having these problems on my computer.
ID: 11777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile OhioDude

Send message
Joined: 11 Dec 05
Posts: 8
Credit: 4,056,499
RAC: 0
Message 11779 - Posted: 8 Mar 2006, 12:40:13 UTC

Stuck at 1% for 67 hours:

HB_BARCODE_30_1scjB_347_827_0
Visit my websites honoring some of America's heroes:
USS Rich DE-695
USS Bunch DE-694 / APD-79
ID: 11779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chilcotin

Send message
Joined: 5 Nov 05
Posts: 15
Credit: 16,969,500
RAC: 0
Message 11783 - Posted: 8 Mar 2006, 16:03:12 UTC

WU 9990980 stuck at 1% and aborted
WU 10012688 stuck at 1% and aborted
ID: 11783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 11789 - Posted: 8 Mar 2006, 19:13:30 UTC
Last modified: 8 Mar 2006, 19:17:06 UTC

Now this one hurt...<bulging eyeballs here> I noticed that the "Percent Done" in BOINCView for WU HOMShz_homDB021_1hz6A_341_126_0 was stuck at 46.1% and the time "To Complete" was growing rather than decreasing. Upon going to the affected machine and after repeatedly attempting to view the graphics and being unable to, I aborted the WU. I sure hate to lose that much time (72.08 hours)!



Result ID 12580084
Name HOMShz_homDB021_1hz6A_341_126_0
Workunit 10092458
Created 5 Mar 2006 2:15:59 UTC
Sent 5 Mar 2006 16:08:13 UTC
Received 8 Mar 2006 17:49:58 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -197 (0xffffff3b)
Computer ID 53343
Report deadline 19 Mar 2006 16:08:13 UTC
CPU time 260384.32427
stderr out <core_client_version>5.2.13</core_client_version>
<message>aborted via GUI RPC
</message>
<stderr_txt>
# random seed: 3541785
# cpu_run_time_pref: 7200

</stderr_txt>


Validate state Invalid
Claimed credit 1094.68782677731
Granted credit 0
application version 4.82

ID: 11789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11791 - Posted: 8 Mar 2006, 20:54:42 UTC

Here is another one

3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 11791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11792 - Posted: 8 Mar 2006, 20:54:46 UTC

Here is another one

3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 11792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11793 - Posted: 8 Mar 2006, 20:54:49 UTC

Here is another one

3/8/2006 1:23:13 PM|rosetta@home|Unrecoverable error for result HOMSdt_homDB030_1dtj__340_171_1 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 11793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile ecafkid

Send message
Joined: 5 Oct 05
Posts: 40
Credit: 15,177,319
RAC: 0
Message 11794 - Posted: 8 Mar 2006, 20:56:06 UTC

That was wierd poested the same message three times never had that happen before.
ID: 11794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Grutte Pier [Wa Oars]~MAB The Frisian
Avatar

Send message
Joined: 6 Nov 05
Posts: 87
Credit: 497,588
RAC: 0
Message 11812 - Posted: 9 Mar 2006, 7:15:36 UTC
Last modified: 9 Mar 2006, 7:26:45 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=12412131

Couldn't find an explanation in Wiki but I may have overlooked it.

ID: 11812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Hell-Mood

Send message
Joined: 12 Feb 06
Posts: 1
Credit: 1,803
RAC: 0
Message 11855 - Posted: 10 Mar 2006, 13:35:19 UTC
Last modified: 10 Mar 2006, 13:41:19 UTC

i suspended work, so i could hibernate my Windoze.
after waking up my pc and hitting "always work" in the boinc manager(v 4.45), it said calculation error on the rosetta job. maybe the log can help to locate and remove that bug.
by the way: it's a great job, doin' research on proteins.


09.03.2006 00:44:15|rosetta@home|Starting result HBLR_1.0_1mky_332_3924_2 using rosetta version 4.82
09.03.2006 00:47:39|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
09.03.2006 00:47:39|rosetta@home|Requesting 0 seconds of work, returning 1 results
09.03.2006 00:47:40|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
09.03.2006 01:32:52||Suspending computation and network activity - user request
09.03.2006 01:32:52|rosetta@home|Pausing result HBLR_1.0_1mky_332_3924_2 (removed from memory)
09.03.2006 01:32:53|rosetta@home|Unrecoverable error for result HBLR_1.0_1mky_332_3924_2 ( - exit code -1073741819 (0xc0000005))
09.03.2006 01:32:53||request_reschedule_cpus: process exited
10.03.2006 14:27:30||Resuming computation and network activity
10.03.2006 14:27:30||request_reschedule_cpus: Resuming activities
10.03.2006 14:27:30|rosetta@home|Computation for result HBLR_1.0_1mky_332_3924_2 finished
ID: 11855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 18 · Next

Message boards : Number crunching : Report stuck & aborted WU here please



©2024 University of Washington
https://www.bakerlab.org