Problems and Technical Issues with Rosetta@home

Author	Message
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 81494 - Posted: 18 Apr 2017, 20:25:46 UTC - in response to Message 81491. It seems to me that the upload problem has been solved. At least, all my stuck WU's have been uploaded now. Yes, same here. Glad whatever the problem was has been resolved. ID: 81494 · Rating: 0 · rate: / Reply Quote

amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0	Message 81495 - Posted: 18 Apr 2017, 22:42:57 UTC All fixed! At least all the backed up stuff is now cleared out. Atta boy Rosetta team, we knew you could do it! Have a great week. Cheers! /M ID: 81495 · Rating: 0 · rate: / Reply Quote

J. Ritchie Morrow Send message Joined: 4 Nov 05 Posts: 5 Credit: 341,049 RAC: 0	Message 81519 - Posted: 3 May 2017, 14:42:56 UTC I keep getting the message that 'Task XX exited with zero status but no finished file. If this happens repeatedly you may need to reset the project.' I have reset the project but continue to get the error. Is this an issue on my end or the project's end? Thanks! ID: 81519 · Rating: 0 · rate: / Reply Quote

Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0	Message 81521 - Posted: 4 May 2017, 10:50:06 UTC - in response to Message 81519. I keep getting the message that 'Task XX exited with zero status but no finished file. If this happens repeatedly you may need to reset the project.' I have reset the project but continue to get the error. Is this an issue on my end or the project's end? Thanks! Copied and pasted from an earlier answer: On Rosetta this is usually solved by increasing the "use at most xxx% of CPU time" setting to 100. You may then want to reduce the "on multiprocessors, use at most xxx% of the processors" to something less than currently set. Most people find this handles the temperature regulation concerns (that the cpu throttling was designed to address) perfectly. Another possible cause are virus scanners; most folks exclude BOINC from those scans or set it to run only when BOINC isn't active. An explanation and more possible causes can be found here: BOINC FAQ Service Please know that this only becomes a fatal error when it occurs 100 times to a particular task; at that point BOINC assumes the task will never be able to finish and gives up on it, ending it as a client error. If you see this message only occasionally it is safe to ignore it. Best, Snags ID: 81521 · Rating: 0 · rate: / Reply Quote

Batschlach Send message Joined: 7 May 17 Posts: 3 Credit: 307,527 RAC: 0	Message 81529 - Posted: 13 May 2017, 14:22:09 UTC Hey, I've received some work units which couldn't be finished due to a compute error. Interestingly, the second person calculating the same WU also resulted in a compute error: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825566938 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825557307 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825537629 (still pending) Is this common behaviour? What has happened there? Best regards ID: 81529 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 81530 - Posted: 13 May 2017, 21:18:03 UTC Moved the details from Batschlach. These are Android WUs. Rosetta Moderator: Mod.Sense ID: 81530 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81532 - Posted: 14 May 2017, 3:13:03 UTC - in response to Message 81529. Hey, I've received some work units which couldn't be finished due to a compute error. Interestingly, the second person calculating the same WU also resulted in a compute error: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825566938 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825557307 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=825537629 (still pending) Is this common behaviour? What has happened there? Best regards This was a bad batch that a researcher accidentally sent out. ID: 81532 · Rating: 0 · rate: / Reply Quote

Batschlach Send message Joined: 7 May 17 Posts: 3 Credit: 307,527 RAC: 0	Message 81533 - Posted: 14 May 2017, 10:00:11 UTC - in response to Message 81532. This was a bad batch that a researcher accidentally sent out. Oh, I see. Thanks for your answer. And thanks for moving my post into the right thread @Mod.Sense! ID: 81533 · Rating: 0 · rate: / Reply Quote

Skillz Send message Joined: 24 May 17 Posts: 3 Credit: 5,914,356 RAC: 0	Message 81538 - Posted: 27 May 2017, 18:03:46 UTC Why am I having such problems getting work units? I have over 250 cores that can be crunching but I only have, at the time of this post, 59 slots filled. This is the only project I am running so those other cores are sitting idle. ID: 81538 · Rating: 0 · rate: / Reply Quote

svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0	Message 81539 - Posted: 27 May 2017, 20:45:41 UTC - in response to Message 81538. Why am I having such problems getting work units? I have over 250 cores that can be crunching but I only have, at the time of this post, 59 slots filled. This is the only project I am running so those other cores are sitting idle. The Server Status page is a sea of red: clearly there are problems of some sort. ID: 81539 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jun 16 Posts: 61 Credit: 25,403,362 RAC: 0	Message 81540 - Posted: 27 May 2017, 20:52:12 UTC The only one that needs to be up all the time is the scheduler which I've seen up and all the ones below it have been up and down. Not everything needs to be running 100% of the time for the project to function. Set a longer queue. ID: 81540 · Rating: 0 · rate: / Reply Quote

xii5ku Send message Joined: 29 Nov 16 Posts: 22 Credit: 13,889,918 RAC: 0	Message 81541 - Posted: 28 May 2017, 10:58:11 UTC - in response to Message 81538. Why am I having such problems getting work units? I have over 250 cores that can be crunching but I only have, at the time of this post, 59 slots filled. This is the only project I am running so those other cores are sitting idle. For a dual- or quad-socket machine, a "Target CPU run time" setting below 4 hours is not sustainable, IME. ID: 81541 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2594 Credit: 47,220,881 RAC: 1	Message 81543 - Posted: 29 May 2017, 3:20:11 UTC - in response to Message 81541. Why am I having such problems getting work units? I have over 250 cores that can be crunching but I only have, at the time of this post, 59 slots filled. This is the only project I am running so those other cores are sitting idle. For a dual- or quad-socket machine, a "Target CPU run time" setting below 4 hours is not sustainable, IME. Correct. Runtime <cannot> be the minimum 1 hour - especially when you have 250 cores (which is great btw). Default run-time is 8 hours, for which you'll do 8 times the work and receive 8 times the credit, but only use 18th of the bandwidth - better for you and the project. Also, likely to reduce the occasions you have unused cores, which answers your question. BUT! You shouldn't change directly from 1hr to 8hrs, otherwise your tasks will miss deadlines. Change up to 2hrs first, until your buffer stockpile is reduced and starts asking for more tasks. Then 3hrs - same process. Then 4hrs etc until you get to a practical level you're happy with - ideally the default 8hrs. ID: 81543 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2205 Credit: 13,720,774 RAC: 6	Message 81544 - Posted: 30 May 2017, 10:11:51 UTC - in response to Message 81539. The Server Status page is a sea of red: clearly there are problems of some sort. Still.... :-( ID: 81544 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81545 - Posted: 30 May 2017, 18:54:42 UTC - in response to Message 81544. The Server Status page is a sea of red: clearly there are problems of some sort. Still.... :-( Sorry for the errors in the status page. I'll take a look. Everything is running as normal so you can ignore the page for now. ID: 81545 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2594 Credit: 47,220,881 RAC: 1	Message 81549 - Posted: 6 Jun 2017, 11:53:21 UTC 2 long-running tasks with a long time since the last checkpoint: b21_ncst_0601.282._relax_SAVE_ALL_OUT_486708_29_0 Last checkpoint: 7:51:46 CPU Time: 11:56:21 b22_1_0603.18._relax_SAVE_ALL_OUT_486983_40_1 Last checkpoint: 2:54:51 CPU Time: 8:23:20 Both have a default 8 hour runtime and I'm anticipating the watchdog being the only thing that stops them running. I've got 2 more b21 tasks in my queue. Should I abort them? Thinking I will. ID: 81549 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2594 Credit: 47,220,881 RAC: 1	Message 81550 - Posted: 6 Jun 2017, 12:16:10 UTC - in response to Message 81549. The 1st one has just completed and given full credit (and more) for the extra runtime. Maybe I should just let them complete after all. Let me see what the other one does. 2 long-running tasks with a long time since the last checkpoint: b21_ncst_0601.282._relax_SAVE_ALL_OUT_486708_29_0 Last checkpoint: 7:51:46 CPU Time: 11:56:21 b22_1_0603.18._relax_SAVE_ALL_OUT_486983_40_1 Last checkpoint: 2:54:51 CPU Time: 8:23:20 Both have a default 8 hour runtime and I'm anticipating the watchdog being the only thing that stops them running. I've got 2 more b21 tasks in my queue. Should I abort them? Thinking I will. ID: 81550 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2594 Credit: 47,220,881 RAC: 1	Message 81553 - Posted: 7 Jun 2017, 1:59:55 UTC - in response to Message 81550. 2ns one not so generous, but both acknowledged the full runtime and validated properly The 1st one has just completed and given full credit (and more) for the extra runtime. Maybe I should just let them complete after all. Let me see what the other one does. 2 long-running tasks with a long time since the last checkpoint: b21_ncst_0601.282._relax_SAVE_ALL_OUT_486708_29_0 Last checkpoint: 7:51:46 CPU Time: 11:56:21 b22_1_0603.18._relax_SAVE_ALL_OUT_486983_40_1 Last checkpoint: 2:54:51 CPU Time: 8:23:20 Both have a default 8 hour runtime and I'm anticipating the watchdog being the only thing that stops them running. I've got 2 more b21 tasks in my queue. Should I abort them? Thinking I will. ID: 81553 · Rating: 0 · rate: / Reply Quote

boinc127 Send message Joined: 23 Jan 12 Posts: 3 Credit: 314,055 RAC: 0	Message 81555 - Posted: 7 Jun 2017, 17:39:26 UTC I've got a b22 task that is wrapping up, but it is crawling to completion at 99.399% Its slowly creeping at 0.01% a minute or so, its in fast relax on model 292 step 7205. I may just abort that task as well... ID: 81555 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2205 Credit: 13,720,774 RAC: 6	Message 81559 - Posted: 8 Jun 2017, 7:43:11 UTC 920412254 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00EECEF9 read attempt to address 0x11DE2000 Engaging BOINC Windows Runtime Debugger... ID: 81559 · Rating: 0 · rate: / Reply Quote