Problems and Technical Issues with Rosetta@home

Author	Message
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0	Message 80715 - Posted: 7 Oct 2016, 9:06:03 UTC - in response to Message 80714. Hello, i had an error while subscribing with a new computer but I got these errors... 07/10/2016 08.20.08 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 08.20.31 \| rosetta@home \| Temporarily failed download of minirosetta_database_d0bf94b.zip: connect() failed 07/10/2016 08.20.32 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 08.20.35 \| \| Project communication failed: attempting access to reference site 07/10/2016 08.20.36 \| \| Internet access OK - project servers may be temporarily down. That can happen. It is not really a problem. Boinc will try it later again. ID: 80715 · Rating: 0 · rate: / Reply Quote

matteo Send message Joined: 20 Jul 16 Posts: 2 Credit: 283,350 RAC: 0	Message 80716 - Posted: 7 Oct 2016, 12:56:09 UTC @sinspin no, after wiping the project from boinc manager and re-suscribing, it gives the same error. the other files were downloaded correctly only minirosetta_database_d0bf94b.zip seems to have the problem.... every 5 minutes it tries to connect receive the same response..... 07/10/2016 14.46.51 \| \| Project communication failed: attempting access to reference site 07/10/2016 14.46.51 \| rosetta@home \| Temporarily failed download of minirosetta_database_d0bf94b.zip: transient HTTP error 07/10/2016 14.46.52 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 14.46.53 \| \| Internet access OK - project servers may be temporarily down. 07/10/2016 14.51.52 \| rosetta@home \| Temporarily failed download of minirosetta_database_d0bf94b.zip: transient HTTP error 07/10/2016 14.51.53 \| \| Project communication failed: attempting access to reference site 07/10/2016 14.51.53 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 14.51.54 \| \| Internet access OK - project servers may be temporarily down. ID: 80716 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80717 - Posted: 7 Oct 2016, 17:43:18 UTC - in response to Message 80716. @sinspin no, after wiping the project from boinc manager and re-suscribing, it gives the same error. the other files were downloaded correctly only minirosetta_database_d0bf94b.zip seems to have the problem.... every 5 minutes it tries to connect receive the same response..... 07/10/2016 14.46.51 \| \| Project communication failed: attempting access to reference site 07/10/2016 14.46.51 \| rosetta@home \| Temporarily failed download of minirosetta_database_d0bf94b.zip: transient HTTP error 07/10/2016 14.46.52 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 14.46.53 \| \| Internet access OK - project servers may be temporarily down. 07/10/2016 14.51.52 \| rosetta@home \| Temporarily failed download of minirosetta_database_d0bf94b.zip: transient HTTP error 07/10/2016 14.51.53 \| \| Project communication failed: attempting access to reference site 07/10/2016 14.51.53 \| rosetta@home \| Started download of minirosetta_database_d0bf94b.zip 07/10/2016 14.51.54 \| \| Internet access OK - project servers may be temporarily down. Can you access it from a browser? https://boinc.bakerlab.org/rosetta/download/minirosetta_database_d0bf94b.zip ID: 80717 · Rating: 0 · rate: / Reply Quote

Daniel Kohn Send message Joined: 30 Dec 05 Posts: 18 Credit: 2,899,939 RAC: 0	Message 80728 - Posted: 10 Oct 2016, 3:59:58 UTC I am getting many Client Errors and Validate Errors. Is it just me? ID: 80728 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 3	Message 80731 - Posted: 11 Oct 2016, 1:10:38 UTC - in response to Message 80702. Last modified: 11 Oct 2016, 1:12:13 UTC These errors are related to the issues we had with the database server a few weeks back. Things should stabilize as new jobs get pushed through and the old jobs get processed. If there are still more than normal validation issues after a week from now, please let me know. Thanks! Last week I reported 14 out of 110 tasks with validate errors. This week it's 13 out of 105 tasks, with 7 coming in the last 20 reported tasks This W7 desktop computer ID: 80731 · Rating: 0 · rate: / Reply Quote

LC Send message Joined: 10 Jun 09 Posts: 8 Credit: 1,895,973 RAC: 0	Message 80769 - Posted: 24 Oct 2016, 13:21:52 UTC I'm brand new on the forum but I've been crunching for several years. I've read this entire thread, there are some posts beyond my technical comprehension but as far as I can tell nobody has brought up the problem I'm having... I know we're all limping along as we await the new servers to be built but I'm only being given active work units, it's not allowing me to keep any 'stored'/'buffer' units. In other words my quad core cpu computer is only being given a max of 4 work units to crunch at a time, as it finishes one unit it reports it and one work unit will be sent back to replace it 1 for 1. My preferences have been the same for years - 10 days of work plus an additional 10. My sizeable buffer began to diminish during the 1st or 2nd week of October. The reason I'm posting here is because my computers don't always have access to the internet and so I'm missing out on quite a bit of work. I've waited a few weeks to see if it would fix itself but no luck. The only thing I haven't done yet is re-install BOINC but I don't think that's the issue. I'm on 7.6.22 (x64) Thank you. LC ID: 80769 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80770 - Posted: 24 Oct 2016, 14:47:50 UTC @LC One way this might happen is if you have other BOINC Projects configured and they are due to run some work, but the project is not giving you work. BOINC Manager sort of waits until the last minute to look to other projects to get work. Otherwise, R@h has been on the brink of not having enough work for everyone for several weeks. It may just be coincidence that you get the number of tasks that you do happen to match your number of CPUs. With limited access to internet connection, I would suggest you have at least two BOINC projects to assure you get work when connection is available. With your limited memory, I'd suggest a project that does not have high memory requirements the way R@h does and has a reliable flow of work units. World Community Grid is one that comes to mind. You can adjust the resource shares between projects to favor one or the other, but when work is unavailable it will request work from the other, even if it's resource share would not otherwise say it is due for crunching time. Rosetta Moderator: Mod.Sense ID: 80770 · Rating: 0 · rate: / Reply Quote

LC Send message Joined: 10 Jun 09 Posts: 8 Credit: 1,895,973 RAC: 0	Message 80772 - Posted: 24 Oct 2016, 18:01:28 UTC - in response to Message 80770. Last modified: 24 Oct 2016, 18:02:30 UTC @LC One way this might happen is if you have other BOINC Projects... It may just be coincidence that you get the number of tasks that you do happen to match your number of CPUs. With your limited memory... You can adjust the resource shares between projects... Thanks for the quick reply! To your points above... I don't have any other projects. I used to work on SETI and Rosetta together but I have long since removed SETI from my list of projects (therefore Rosetta has 100% resource share). That's why I don't believe it's any coincidence I'm getting a "1 for 1" swap with Rosetta workunits. Within minutes of a core completing a work unit it gets reported and replaced with a single new one to keep the core busy again. 1 for 1. It's been like this for about 2 weeks or so. My 10 day buffer started slowly dropping 3 or 4 weeks ago. Starting soon after the server crash, it looks like Rosetta purposely allowed my 10 day buffer to go to zero and it's only giving me the most absolute minimum work units to keep each of my 4 cores busy. ---I should mention, my posts here aren't a complaint really, I was just surprised nobody else had mentioned this behavior. If Rosetta is running low on work units to send people, I would think this is happening to everyone, so by no means am I upset, just curious.--- I've been assuming this was happening to everyone because of the server issues. I was hoping I'd get my 10 days of work units back when the servers get fixed. I have a second computer (an old Dell single core) experiencing the same '1 for 1' issue. Each computer has different Local Computing Preferences (which is what I have them both set to use) and yes, I even checked my web preferences and that's set to 10 extra days also. My old single-core Dell finishes several work units per day earning maybe a couple hundred points per day. My quad core averages dozens of work units per day giving me an average of over 1100 points per day. My quad core is a 64 bit i3 with 6GB RAM on Windows 10...plenty of RAM. I know I /could/ add SETI or another project to keep my CPUs warm but Rosetta is the only one I really wanted. We'll see. I'm assuming there's nothing I can do to get buffer units for Rosetta until they replace the servers. I'm assuming this is happening to someone else besides me & my 2 computers. Thoughts, comments & ideas from anyone are welcome. Thanks for your help. (ps - I have an Nvidia Shield K1 Android tablet. Any idea if/when I might be able to use it for Rosetta?) LC http://boincstats.com/signature/14/user/48262/project/sig.png ID: 80772 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80773 - Posted: 24 Oct 2016, 22:09:49 UTC @LC Have you checked which venue your machines are in? And what the preferences are for that venue? I wonder if your preferences got messed up during the database issues they had. It is almost acting like a project that the BOINC Manager thinks has a resource share of zero (a "backup" project). Suggest you look at message log to see which project is hosting your preferences. Then revise the settings for days of cache and resource share on that project to a new value so the values get reset in the server. And then have you overridden the project settings on your two host machines? Maybe tweak the values there too, just to force it to replace them. Rosetta Moderator: Mod.Sense ID: 80773 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 3	Message 80775 - Posted: 25 Oct 2016, 1:17:14 UTC - in response to Message 80772. Starting soon after the server crash, it looks like Rosetta purposely allowed my 10 day buffer to go to zero and it's only giving me the most absolute minimum work units to keep each of my 4 cores busy. ---I should mention, my posts here aren't a complaint really, I was just surprised nobody else had mentioned this behavior. If Rosetta is running low on work units to send people, I would think this is happening to everyone, so by no means am I upset, just curious.--- I've been assuming this was happening to everyone because of the server issues. I was hoping I'd get my 10 days of work units back when the servers get fixed. [...] I'm assuming there's nothing I can do to get buffer units for Rosetta until they replace the servers. I'm assuming this is happening to someone else besides me & my 2 computers. No. For quite some while all buffers are being filled. Best to check the settings showing on the website here, as advised (Computing Preferences and Rosetta Preferences). Something very odd going on. ID: 80775 · Rating: 0 · rate: / Reply Quote

LC Send message Joined: 10 Jun 09 Posts: 8 Credit: 1,895,973 RAC: 0	Message 80776 - Posted: 25 Oct 2016, 3:45:43 UTC - in response to Message 80775. Have you checked which venue your machines are in? Good idea, hadn't thought of that. Just checked it now...I have no venue settings. Home, School and Work have all been left blank, leaving only the main Preference(s). I just now tweaked the main settings slightly just for the sake of doing so but neither of my computers are set to use web preferences. I have both set to use local preferences since there is a vast gap between them in CPU & RAM capabilities. I slightly tweaked the local settings on both computers a number of days ago in the hope that would work. No dice. I wonder if your preferences got messed up during the database issues they had. That's exactly what I've been thinking because the timing is just too much of a coincidence...but I see no evidence of that on my end and I've re-tweaked everything already. I don't remember messing with ANY of my settings for quite some time until this issue. I brought the old Dell out of retirement just to see if it was a problem with my quad core computer. It is almost acting like a project that the BOINC Manager thinks has a resource share of zero (a "backup" project). I never knew that trick, cool idea, thanks. Suggest you look at message log to see which project is hosting your preferences. Then revise the settings for days of cache and resource share on that project to a new value so the values get reset in the server. And then have you overridden the project settings on your two host machines? Maybe tweak the values there too, just to force it to replace them. I've never used/read the Message Log before, great idea, thank you! I've just now checked it. I've already re-tweaked everything. I only run Rosetta and all my preferences are derived from there...but you are right... Something very odd going on. 24-Oct-16 23:03:11 \| rosetta@home \| Sending scheduler request: To fetch work. 24-Oct-16 23:03:11 \| rosetta@home \| Requesting new tasks for CPU and Intel GPU 24-Oct-16 23:03:13 \| rosetta@home \| Scheduler request completed: got 0 new tasks 24-Oct-16 23:03:13 \| rosetta@home \| No work sent 24-Oct-16 23:03:13 \| rosetta@home \| (won't finish in time) BOINC runs 86.5% of time, computation enabled 99.5% of that 24-Oct-16 23:09:22 \| rosetta@home \| Sending scheduler request: To fetch work. 24-Oct-16 23:09:22 \| rosetta@home \| Requesting new tasks for CPU and Intel GPU 24-Oct-16 23:09:24 \| rosetta@home \| Scheduler request completed: got 0 new tasks 24-Oct-16 23:09:24 \| rosetta@home \| No work sent 24-Oct-16 23:09:24 \| rosetta@home \| (won't finish in time) BOINC runs 86.5% of time, computation enabled 99.5% of that Why does BOINC not think I can finish in time? (Time and date are accurate on my computers.) I routinely complete dozens of work units per day, this '1 for 1' swap has been going on for over a week. 86.5% up-time on a quad core i3 with 6GBs of RAM has been giving me well over 300 work units in my buffer until this recent issue. (Yes, my Tasks view is View ALL Tasks.) Is it time to tr the "Reset Project" button? Do I need to reinstall BOINC? Sorry for the long replies but I want to provide as much relevant detail as possible. I appreciate your time & help, thank you. LC ID: 80776 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2202 Credit: 13,720,774 RAC: 33	Message 80777 - Posted: 25 Oct 2016, 13:07:59 UTC Please, stop all "2xa0_Xcdp_" wus.... ID: 80777 · Rating: 0 · rate: / Reply Quote

Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,662,635 RAC: 0	Message 80778 - Posted: 25 Oct 2016, 14:54:08 UTC @LC - Just checking the obvious here, so I don't mean to be insulting by asking such a simple question but it's happened to myself before without even noticing so I thought I would ask/mention... Did you by chance hit the 'Show active tasks' button at the top left of the 'Tasks' pane of BOINC (If so it will currently say 'Show all tasks', try toggling it and seeing if you indeed do have more work buffered). Secondly, a 10 day buffer (if I read your post right, that is what you had set?) seems really really excessive, some of the simulations being run on R@H can complete in just a few days - why be the one guy slowing down pace of iteration/experimentation by making researchers wait a whole 10 days? I usually keep a 1 day buffer, and have 'Mapping Cancer Markers' (via World Community Grid) as my 'backup' project when R@H is out of work.. just a thought. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research ID: 80778 · Rating: 0 · rate: / Reply Quote

LC Send message Joined: 10 Jun 09 Posts: 8 Credit: 1,895,973 RAC: 0	Message 80779 - Posted: 25 Oct 2016, 15:46:15 UTC - in response to Message 80778. Did you by chance hit the 'Show active tasks' button... No offense taken at all. Yes, I have checked the Active Task button and that's not it. My wifi randomly cuts out on my quad core Asus and the only way to fix it is to manually restart the whole laptop. (The wifi problem has nothing to do with anything else here.) When the signal dies overnight, within an hour or two I'm out of work units because of this lack of buffer-WUs issue. If I had mistakenly 'hid' the tasks I'd still have a steady, uninterrupted stream of production, if I wasn't having this actual problem. a 10 day buffer...seems really really excessive...why be the one guy slowing down pace of iteration/experimentation by making researchers wait a whole 10 days?... just a thought. I never really thought of it that way but I don't think that's exactly how it works...is it? I could definitely be wrong but it's my understanding the same WUs are sent to a bunch of people. It would be risky to rely on the results of a single computer so several (dozens??) computers are used to work on the same WU so results can be compared. Also, I could be mistaken, but I don't believe anyone's buffer is 'holding up' research. When everything is working correctly on both ends & BOINC on my end 'learns' how many WUs I average, I'm only given the WUs I can handle by the deadline. Nearly all of my WUs are completed ahead of the deadline so I would assume this doesn't 'hold up' research. Maybe I'm wrong but I'm assuming everyone's happy as long as I'm meeting deadlines. Thanks for your POV though, I never thought of it that way. Any mods have an opinion on this? some of the simulations being run on R@H can complete in just a few days DAYS? For one work unit? I've never read anywhere if the researchers prefer a long run-time-per-WU over a short run-time-per-WU so I've always (7 years now) had my preferences set to shorter WUs. Thanks for the input Timo, I appreciate it. Please, stop all "2xa0_Xcdp_" wus.... @ VENETO I'm not sure if your message was for me but I haven't seen any WUs with that info yet. It's not possible for me to keep constantly checking due to my present problem though. ID: 80779 · Rating: 0 · rate: / Reply Quote

Juha Send message Joined: 28 Mar 16 Posts: 13 Credit: 705,034 RAC: 0	Message 80780 - Posted: 25 Oct 2016, 20:12:41 UTC - in response to Message 80769. My preferences have been the same for years - 10 days of work plus an additional 10. The "Store at least" preference not only tells BOINC what cache size you prefer but also tells BOINC how often BOINC can expect your computer to be online. Setting it to 10 days tells BOINC there's an Internet connection available every 10 days. Because of that BOINC will try to finish all tasks at least 10 days before their deadline. Right now Rosetta has tasks with 2, 5 and 7 day deadlines. Since there is less than 10 days available to get the work done BOINC client will be constantly in panic mode trying to finish the work before deadline. When the client is in panic mode it may decide to not to try to get more work from the project in question. More work would just make a bad situation even worse. The scheduler checks as well if you can meet the deadline before giving you work. There's fallback code that gives work regardless of deadlines if you have an idle CPU. That explains why you have any work at all. Earlier you had full cache from Seti because they have much longer deadlines. The fix is simple. Just decrease your cache size to something more reasonable, like two days. ID: 80780 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 3	Message 80781 - Posted: 26 Oct 2016, 2:46:06 UTC - in response to Message 80776. Last modified: 26 Oct 2016, 2:53:37 UTC Something very odd going on. [...] 24-Oct-16 23:03:13 \| rosetta@home \| (won't finish in time) BOINC runs 86.5% of time, computation enabled 99.5% of that [...] Why does BOINC not think I can finish in time? (Time and date are accurate on my computers.) I routinely complete dozens of work units per day, this '1 for 1' swap has been going on for over a week. 86.5% up-time on a quad core i3 with 6GBs of RAM has been giving me well over 300 work units in my buffer until this recent issue. LC This is the nub of the issue, which I was mulling over before I saw Juha's message, where he picks out the one quote from you we all missed: My preferences have been the same for years - 10 days of work plus an additional 10. The "Store at least" preference not only tells BOINC what cache size you prefer but also tells BOINC how often BOINC can expect your computer to be online. Setting it to 10 days tells BOINC there's an Internet connection available every 10 days. Because of that BOINC will try to finish all tasks at least 10 days before their deadline. Right now Rosetta has tasks with 2, 5 and 7 day deadlines. Since there is less than 10 days available to get the work done BOINC client will be constantly in panic mode trying to finish the work before deadline. When the client is in panic mode it may decide to not to try to get more work from the project in question. More work would just make a bad situation even worse. The scheduler checks as well if you can meet the deadline before giving you work. There's fallback code that gives work regardless of deadlines if you have an idle CPU. That explains why you have any work at all. Earlier you had full cache from Seti because they have much longer deadlines. The fix is simple. Just decrease your cache size to something more reasonable, like two days. If all Rosetta deadlines are 2, 5 or 7 days and you're asking for 10 days worth of tasks minimum plus another 10 days, none of the tasks can meet deadline, so you get none. Until you're out completely, when you get one per core. In Computing preferences in Boinc set "Store at least 0 days of work" then set "Store up to an additional 1.5 days of work" to ensure all deadlines can be met including the default task runtime plus any quirksvariations. Once you confirm this is working for you, you seem to have set only a 1 hour default runtime, when the current default ought to be 8 hours. You need to tweak that up progressively over a period of time because if you say you only have occasional internet access you're trying to connect many more times than you need to be. This will reduce the number of tasks you hold, but 300 tasks in your buffer on a 4-core machine with only a 1 hour runtime is way, way over the top. I hold around 40 8hr tasks (on an 8-core machine), which even I think is pretty high. ID: 80781 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 80782 - Posted: 26 Oct 2016, 4:24:51 UTC - in response to Message 80777. Please, stop all "2xa0_Xcdp_" wus.... It might be more effective to give a reason why. ID: 80782 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 80783 - Posted: 26 Oct 2016, 4:32:11 UTC - in response to Message 80779. [snip] some of the simulations being run on R@H can complete in just a few days DAYS? For one work unit? I've never read anywhere if the researchers prefer a long run-time-per-WU over a short run-time-per-WU so I've always (7 years now) had my preferences set to shorter WUs. Thanks for the input Timo, I appreciate it. I've read that they wanted longer workunits selected as a way to reduce the load on the server. I haven't seen whether they changed their minds on that, though. ID: 80783 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2202 Credit: 13,720,774 RAC: 33	Message 80784 - Posted: 26 Oct 2016, 6:46:02 UTC - in response to Message 80782. Please, stop all "2xa0_Xcdp_" wus.... It might be more effective to give a reason why. Message 80768 All errors.... ID: 80784 · Rating: 0 · rate: / Reply Quote

LC Send message Joined: 10 Jun 09 Posts: 8 Credit: 1,895,973 RAC: 0	Message 80785 - Posted: 26 Oct 2016, 16:03:22 UTC @Juha & @Sid I need to take a little time to re-read and more fully understand what you're telling me. At the moment I think I understand the basics of what you're both saying but there's one glaring fact that's causing me some confusion. - All of my settings have been basically the same since I first started crunching. Starting 7 years ago, every time I brought in a new computer (12 different computers over 7 years), I have set up every single one with almost identical settings and I'm certain I requested the "10 + 10" every single time...and I've never had any problems until a few weeks ago. I've never had major problems making deadlines. I've always had a good buffer of WUs. I've been happily crunching away with Windows XP, Windows Vista, Windows 7, Windows 10 & Linux Mint. I had a 3-year hiatus in the middle but I've still put up almost 1.2 million points for Rosetta...so I don't understand how such a sudden and drastic change in my additional WU stash could be caused by my settings which have basically never changed. When something works for 1.2 million points over 7 years on a dozen different machines, with 5 different OSs, that's a pretty good case for reliability. I very much appreciate everyone's help. You've given me a good bit to re-read and better understand so let me go through it & I'll be back here once I've tried some things. Thank you very much to everyone helping out. I'm already testing some of the suggestions you've all offered so hopefully I can report back with good news soon. Thanks again! ID: 80785 · Rating: 0 · rate: / Reply Quote