Two consistent and persistent errors

Message boards : Number crunching : Two consistent and persistent errors

To post messages, you must log in.

AuthorMessage
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77861 - Posted: 28 Jan 2015, 6:21:30 UTC

1/26/2015 9:01:31 AM | rosetta@home | Starting task rb_01_23_53131_98688__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242143_2568_0
1/26/2015 9:43:45 AM | rosetta@home | Aborting task rb_01_23_53132_98689__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2218_0: exceeded elapsed time limit 122964.12 (500000.00G/4.07G)

1/26/2015 3:04:53 PM | rosetta@home | Task rb_01_23_53132_98689__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2226_0 exited with zero status but no 'finished' file
1/26/2015 3:04:53 PM | rosetta@home | If this happens repeatedly you may need to reset the project.

I see other computers have been completing the work units, and I have reset the project a couple times with no effect, including reinstalling the client. Very nearly every unit I've run on this machine in the past couple of weeks gets one of these errors. Anyone have ideas or insights?
ID: 77861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77862 - Posted: 28 Jan 2015, 6:35:32 UTC

I just saw this, I'll see if any of these steps make a difference.

http://boincfaq.mundayweb.com/index.php?language=1&view=116
ID: 77862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77865 - Posted: 28 Jan 2015, 15:46:52 UTC

It appears you have specified (in the Rosetta preferences, configured via the website) a runtime preference of 2 days.

At present, there seem to be some issues with tasks running that long. Setting the preference to 1 day will avoid the problem until it is fixed.
Rosetta Moderator: Mod.Sense
ID: 77865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77869 - Posted: 29 Jan 2015, 0:47:41 UTC

I changed that. The other changes per the link above don't appear to have made any difference.
ID: 77869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 77871 - Posted: 29 Jan 2015, 7:00:12 UTC

Hi Erik.

I had a quick look at your erred tasks and they all look to be of the rb__SAVE__ALL__OUT type, I have been aborting those for many months because they where always erring on my rigs as well and no one has fixed the problem, I did report it.

And I only run tasks here for 4hrs.



ID: 77871 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77876 - Posted: 30 Jan 2015, 1:33:03 UTC

The timeout errors have been resolved by adjusting the runtime preference. The "exited with zero status" errors are still occurring. Some of the these do complete, but not many.

1/29/2015 1:26:18 PM | rosetta@home | Computation for task rb_01_23_53132_98689__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_242144_2228_0 finished
ID: 77876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77897 - Posted: 5 Feb 2015, 19:34:18 UTC

I've removed the 2 day option until we fix this issue on the next application update. Sorry for any inconvenience. Please lower your run time preference if you've set it to 2 days.
ID: 77897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,624,317
RAC: 7,073
Message 77900 - Posted: 6 Feb 2015, 10:03:56 UTC - in response to Message 77897.  

I've removed the 2 day option until we fix this issue on the next application update.


Uh, so you are working on new app version?
:-)
ID: 77900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77928 - Posted: 13 Feb 2015, 4:19:41 UTC
Last modified: 13 Feb 2015, 4:20:55 UTC

Thanks for responding David. I currently have the the target CPU run time preference set to twelve hours. I haven't received any time-out errors since, but the second error, "exited with zero status but no 'finished' file," is returned for nearly every unit I process.

12-Feb-15 20:37:30 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h002___robetta_IGNORE_THE_REST_09_16_242762_18_0 exited with zero status but no 'finished' file
12-Feb-15 20:37:30 | rosetta@home | If this happens repeatedly you may need to reset the project.
12-Feb-15 20:49:19 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h002___robetta_IGNORE_THE_REST_09_16_242762_18_0 exited with zero status but no 'finished' file
12-Feb-15 20:49:19 | rosetta@home | If this happens repeatedly you may need to reset the project.
12-Feb-15 20:56:31 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file
12-Feb-15 20:56:31 | rosetta@home | If this happens repeatedly you may need to reset the project.
12-Feb-15 21:01:56 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file
12-Feb-15 21:01:56 | rosetta@home | If this happens repeatedly you may need to reset the project.
12-Feb-15 21:08:22 | rosetta@home | Task Ross3X3_SAVE_ALL_OUT_t149_009_242754_249_0 exited with zero status but no 'finished' file
12-Feb-15 21:08:22 | rosetta@home | If this happens repeatedly you may need to reset the project.
12-Feb-15 21:08:26 | rosetta@home | Task rb_02_08_53120_99050_ab_stage0_h003___robetta_IGNORE_THE_REST_11_13_242763_13_0 exited with zero status but no 'finished' file
ID: 77928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 77929 - Posted: 13 Feb 2015, 8:00:33 UTC - in response to Message 77928.  
Last modified: 13 Feb 2015, 8:04:00 UTC

but the second error, "exited with zero status but no 'finished' file," is returned for nearly every unit I process.


It is a common error that can affect any BOINC project, not just Rosetta. Unfortunately no specific cause has been identified as yet, so it may take a bit of effort to track down a solution that works for your system.

You may want to try some of the solutions suggested at this BOINC FAQ website.

I had the same problem until I increased my CPU usage to "Use at most 100.0 percent of CPU time" (to avoid heat problems in the summer I reduced the number of cores BOINC can use instead). As soon as that setting was changed all my exit zero errors disappeared instantly.

Hopefully your problem will also be as easy to solve, but there are some alternative suggestions on that site as well.

---

Edit: I see from the posts above that you already tried those solutions. Unfortunately if they didn't work then there is not much you can do other than play about with your BOINC and system settings in the hope of stumbling across a solution.
ID: 77929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77930 - Posted: 13 Feb 2015, 13:13:30 UTC

The odd thing is, Rosetta is the only project which returns those errors. I currently have the processor time set to the default of 50%. I'll let it run at 100% today and see if that makes a difference.
ID: 77930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77937 - Posted: 13 Feb 2015, 23:56:14 UTC

So I checked my log after getting home from work today, and it looks like everything is completing successfully now. The last unit to fail was just before I changed the preferences.

So, set the target CPU run time to twelve hours, and the max CPU time usage to 100%.

I hope the CPU usage requirement will be fixed soon. I don't want to have to run my box at 100% all day in a desert summer.
ID: 77937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77938 - Posted: 13 Feb 2015, 23:58:34 UTC

Is there a way to edit the title that I'm missing? I'd like to add [Fixed] to the title.
ID: 77938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77939 - Posted: 14 Feb 2015, 0:59:54 UTC

not sure about the title.

can anyone point me to workunits set for 48 hours that failed?
ID: 77939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77940 - Posted: 14 Feb 2015, 1:31:23 UTC

I'm pretty sure I would have had some a couple months ago, but those have cycled out of my logs by now. If no one has any current ones, I can just set my client to grab 48 hour sets.
ID: 77940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77946 - Posted: 15 Feb 2015, 3:07:35 UTC - in response to Message 77939.  

can anyone point me to workunits set for 48 hours that failed?


Can you use old WU numbers? Or have the details you seek already been purged on those?
Rosetta Moderator: Mod.Sense
ID: 77946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Erik

Send message
Joined: 25 Jun 09
Posts: 11
Credit: 2,904,454
RAC: 0
Message 77979 - Posted: 26 Feb 2015, 6:39:12 UTC

I don't know if this will be helpful, but yesterday when I rebooted my computer to install updates, several of the Rosetta units in process at the time failed, even though I shut down BOINC gracefully. The tasks were all from either the SAVE_ALL_OUT or IGNORE_THE_REST group. Here's a couple examples:


24-Feb-15 22:12:24 | rosetta@home | Task TL_test_2008_0165_0994_0960_2059_00350256_0157_0891_0009_0875_0001_fold_SAVE_ALL_OUT_244879_1833_0 exited with zero status but no 'finished' file
24-Feb-15 22:12:24 | rosetta@home | Task rb_02_23_53371_99352_ab_stage0_h004___robetta_IGNORE_THE_REST_07_15_244897_85_0 exited with zero status but no 'finished' file

The next to finish was:

25-Feb-15 02:17:35 | rosetta@home | Computation for task TL_test_1478_0993_0262_0916_2046_0140_0741_0187_0164_0011_0153_0001_fold_SAVE_ALL_OUT_244870_3991_0 finished
ID: 77979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Two consistent and persistent errors



©2024 University of Washington
https://www.bakerlab.org