Problems and Technical Issues with Rosetta@home

Author	Message
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80621 - Posted: 10 Sep 2016, 21:46:53 UTC Please report any issues with work units in this thread. Rosetta Moderator: Mod.Sense ID: 80621 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 80622 - Posted: 10 Sep 2016, 21:47:57 UTC Link to older technical issues thread. Rosetta Moderator: Mod.Sense ID: 80622 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2592 Credit: 47,220,881 RAC: 1	Message 80626 - Posted: 11 Sep 2016, 3:41:27 UTC I'll do a full boinc server upgrade when we get our hardware. Excellent. Thanks. ID: 80626 · Rating: 0 · rate: / Reply Quote

Diplomat Send message Joined: 2 Aug 10 Posts: 7 Credit: 15,651,420 RAC: 0	Message 80627 - Posted: 11 Sep 2016, 17:44:29 UTC Credits granted value became significantly lower than it used to be. At the same time table with tasks now almost instantly get cleaned, so it's hard to prove a point (I have 24 hrs computing). The only visible result now (while I was writing this message it was gone form the table too) 873957277 782555353 10 Sep 2016 13:02:18 UTC 11 Sep 2016 17:27:12 UTC Over Success Done 85,467.54 1,032.09 691.97 Others that I did manage to spot had the same CPU time, Credits Claimed slightly over 1,000 and granted around 550-650 before i would expect values between 850-1,000 ID: 80627 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80628 - Posted: 11 Sep 2016, 19:30:38 UTC We are trying to reduce the size of the database and also the load on the database server. So we dramatically shortened the time workunits/results remain in our database temporarily. Sorry for any inconvenience and we do expect to go back to normal time spans soon after we catch up on the large backlog. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. I expect it to take a week or so for things to settle back to normal and we will upgrade the database server to a temporary server with much more disk space, faster disks, and double the memory which should hopefully help also. The upgrade will result in another day of downtime as the data gets transferred etc. This will happen sometime next week. ID: 80628 · Rating: 0 · rate: / Reply Quote

SPKA67213 Send message Joined: 12 Jan 06 Posts: 6 Credit: 20,381,212 RAC: 0	Message 80629 - Posted: 12 Sep 2016, 0:38:58 UTC - in response to Message 80628. We are trying to reduce the size of the database and also the load on the database server. So we dramatically shortened the time workunits/results remain in our database temporarily. Sorry for any inconvenience and we do expect to go back to normal time spans soon after we catch up on the large backlog. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. I expect it to take a week or so for things to settle back to normal and we will upgrade the database server to a temporary server with much more disk space, faster disks, and double the memory which should hopefully help also. The upgrade will result in another day of downtime as the data gets transferred etc. This will happen sometime next week. Can't receive new work units for the last couple of weeks. After reading some of the Q&A I'm still not sure what is broke, why, or when whatever is broken will be fixed. Love to contribute my CPU cycles but with nothing to crunch from R@H I need to move to a different project. It would be nice if the project home page had a headline that says "yeah, we know it's broken and we're working to fix it. LINKY HERE." YMMV and all, but it just seemed a little silly that a casual contributor (4 computers, 12 CPU cores) has to dig around to find out there's likely nothing wrong on his end. See you next year when R@H might be fixed. ID: 80629 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80630 - Posted: 12 Sep 2016, 4:41:41 UTC - in response to Message 80629. We are trying to reduce the size of the database and also the load on the database server. So we dramatically shortened the time workunits/results remain in our database temporarily. Sorry for any inconvenience and we do expect to go back to normal time spans soon after we catch up on the large backlog. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. I expect it to take a week or so for things to settle back to normal and we will upgrade the database server to a temporary server with much more disk space, faster disks, and double the memory which should hopefully help also. The upgrade will result in another day of downtime as the data gets transferred etc. This will happen sometime next week. Can't receive new work units for the last couple of weeks. After reading some of the Q&A I'm still not sure what is broke, why, or when whatever is broken will be fixed. Love to contribute my CPU cycles but with nothing to crunch from R@H I need to move to a different project. It would be nice if the project home page had a headline that says "yeah, we know it's broken and we're working to fix it. LINKY HERE." YMMV and all, but it just seemed a little silly that a casual contributor (4 computers, 12 CPU cores) has to dig around to find out there's likely nothing wrong on his end. See you next year when R@H might be fixed. I'm not sure why you aren't getting work units. The system seems ok now and clients should be getting jobs. My desktops are crunching and were able to get jobs recently. Can you try to detach and reattach and see if that helps? ID: 80630 · Rating: 0 · rate: / Reply Quote

shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0	Message 80631 - Posted: 12 Sep 2016, 7:57:02 UTC - in response to Message 80629. We are trying to reduce the size of the database and also the load on the database server. So we dramatically shortened the time workunits/results remain in our database temporarily. Sorry for any inconvenience and we do expect to go back to normal time spans soon after we catch up on the large backlog. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. I expect it to take a week or so for things to settle back to normal and we will upgrade the database server to a temporary server with much more disk space, faster disks, and double the memory which should hopefully help also. The upgrade will result in another day of downtime as the data gets transferred etc. This will happen sometime next week. Can't receive new work units for the last couple of weeks. After reading some of the Q&A I'm still not sure what is broke, why, or when whatever is broken will be fixed. Love to contribute my CPU cycles but with nothing to crunch from R@H I need to move to a different project. It would be nice if the project home page had a headline that says "yeah, we know it's broken and we're working to fix it. LINKY HERE." YMMV and all, but it just seemed a little silly that a casual contributor (4 computers, 12 CPU cores) has to dig around to find out there's likely nothing wrong on his end. See you next year when R@H might be fixed. Are you running the Mac client? It's been more broken for a while. Especially bad in the last few days, and my Mac has just run out of work again, but has about 15 completed work units waiting to be reported. Per my other thread, I've confirmed that the Windows 10 and Ubuntu versions are running normally (after the recent rounds of flakiness), but the OS X version remains sick. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 \| Speech) ID: 80631 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2592 Credit: 47,220,881 RAC: 1	Message 80638 - Posted: 14 Sep 2016, 10:38:41 UTC - in response to Message 80628. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. A pity. My awarded credits are much higher compared to claimed than they used to be. Good statement on the front page ID: 80638 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80642 - Posted: 14 Sep 2016, 17:33:44 UTC - in response to Message 80638. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. A pity. My awarded credits are much higher compared to claimed than they used to be. Good statement on the front page The credit granting stats should be up-to-date. The db server is working well now and seems to be caught up. I'm going to gradually increase the work history timespan also. We'll have a period of downtime soon to upgrade the database server also. ID: 80642 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80643 - Posted: 14 Sep 2016, 17:34:50 UTC My mac is running well. Are others having mac client issues? ID: 80643 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2592 Credit: 47,220,881 RAC: 1	Message 80652 - Posted: 14 Sep 2016, 21:59:17 UTC - in response to Message 80642. Last modified: 14 Sep 2016, 22:00:45 UTC Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. A pity. My awarded credits are much higher compared to claimed than they used to be. Good statement on the front page The credit granting stats should be up-to-date. The db server is working well now and seems to be caught up. I'm going to gradually increase the work history timespan also. We'll have a period of downtime soon to upgrade the database server also. All good here, thanks. I should mention, though, that all the 10_s??_PH160908 tasks would last a whole lot longer if they didn't hit the 100 decoy limit and shut down within 2 hours. My run-time average is so little I've got over 100 tasks waiting to run now. That's great for us users but seems counter-productive for the server at this time. Maybe don't increase that history timespan just yet. ID: 80652 · Rating: 0 · rate: / Reply Quote

sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0	Message 80653 - Posted: 14 Sep 2016, 22:42:37 UTC have got a lot (15) of validate errors for almost all of my last work batch. here are 3 of them: https://boinc.bakerlab.org/rosetta/result.php?resultid=873772221 https://boinc.bakerlab.org/rosetta/result.php?resultid=873771928 https://boinc.bakerlab.org/rosetta/result.php?resultid=873771868 ID: 80653 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80654 - Posted: 15 Sep 2016, 8:13:29 UTC - in response to Message 80652. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. A pity. My awarded credits are much higher compared to claimed than they used to be. Good statement on the front page The credit granting stats should be up-to-date. The db server is working well now and seems to be caught up. I'm going to gradually increase the work history timespan also. We'll have a period of downtime soon to upgrade the database server also. All good here, thanks. I should mention, though, that all the 10_s??_PH160908 tasks would last a whole lot longer if they didn't hit the 100 decoy limit and shut down within 2 hours. My run-time average is so little I've got over 100 tasks waiting to run now. That's great for us users but seems counter-productive for the server at this time. Maybe don't increase that history timespan just yet. Yes, very observant! I'm planning to fix this 100 model hard limit issue for the next app update. I think there's a command line option also that I'll look into for new jobs in the near future. ID: 80654 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 80655 - Posted: 15 Sep 2016, 8:18:25 UTC - in response to Message 80653. have got a lot (15) of validate errors for almost all of my last work batch. here are 3 of them: https://boinc.bakerlab.org/rosetta/result.php?resultid=873772221 https://boinc.bakerlab.org/rosetta/result.php?resultid=873771928 https://boinc.bakerlab.org/rosetta/result.php?resultid=873771868 Hmmm.... I'll take a look at these in detail ID: 80655 · Rating: 0 · rate: / Reply Quote

shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0	Message 80656 - Posted: 15 Sep 2016, 19:33:09 UTC - in response to Message 80654. Please also do not trust the stats being displayed. I also turned off the stat updates since the db queries were putting a large load on our server slowing things down too much. A pity. My awarded credits are much higher compared to claimed than they used to be. Good statement on the front page The credit granting stats should be up-to-date. The db server is working well now and seems to be caught up. I'm going to gradually increase the work history timespan also. We'll have a period of downtime soon to upgrade the database server also. All good here, thanks. I should mention, though, that all the 10_s??_PH160908 tasks would last a whole lot longer if they didn't hit the 100 decoy limit and shut down within 2 hours. My run-time average is so little I've got over 100 tasks waiting to run now. That's great for us users but seems counter-productive for the server at this time. Maybe don't increase that history timespan just yet. Yes, very observant! I'm planning to fix this 100 model hard limit issue for the next app update. I think there's a command line option also that I'll look into for new jobs in the near future. Interesting stuff about the 10-prefix units and there is also a notice on the top webpage. None of that seems to explain the special or excess problems the Mac client is suffering from. Only (superficial) thing that I've noticed is that the "transitioner" server is often down. Is that required only by the Mac client? (My Windows 10 and Ubuntu Linux machines are mostly running okay, but the Mac has another pile of unreported work units and will soon run out of work (again).) Some problem with the Preview function here? #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 \| Speech) ID: 80656 · Rating: 0 · rate: / Reply Quote

RosettaMac Send message Joined: 5 Dec 13 Posts: 11 Credit: 1,639,488 RAC: 0	Message 80657 - Posted: 15 Sep 2016, 20:59:48 UTC - in response to Message 80643. My mac is running well. Are others having mac client issues? My iMac hasn't received any work for several days. I've tried manual Updates and Reset Project from the BOINC Manager, but still get no new work. ID: 80657 · Rating: 0 · rate: / Reply Quote

Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0	Message 80660 - Posted: 16 Sep 2016, 19:36:48 UTC - in response to Message 80643. Last modified: 16 Sep 2016, 19:43:50 UTC My mac is running well. Are others having mac client issues? I'm not having any mac specific issues; the only time I haven't gotten new work seems to coincide with the same problems that effected everyone else. I do have concerns about a "acourbet.10.design_S" unit that is claiming large amounts of working memory (3.08GB right now, down from 3.2GB). It's been running about 4.5 hours, the last checkpoint was about 2.5 hours ago. It's an older machine with only 4GB of memory but as it's not doing anything else at the moment BOINC is allowed to use all of it. I'll keep an eye on it but thought a head's up might be in order. edited to add: this is a resend after the previous cruncher's effort ended with an "out of memory error". workunit Best, Snags ID: 80660 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2204 Credit: 13,720,774 RAC: 8	Message 80666 - Posted: 22 Sep 2016, 12:51:39 UTC Servers are down again.... Any news about this: An interim solution will be to temporarily upgrade our database server and thanks to our sys admins, we already have a machine ready to go that has plenty of disk space and double the memory. The upgrade will require a day of downtime which is planned to happen early this week. Are we on temporary server? ID: 80666 · Rating: 0 · rate: / Reply Quote

JonPer Send message Joined: 4 May 06 Posts: 14 Credit: 510,105 RAC: 0	Message 80667 - Posted: 22 Sep 2016, 17:34:06 UTC - in response to Message 80660. No additional workpackage since the 15th, it is getting cold in here...!!! My mac is running well. Are others having mac client issues? I'm not having any mac specific issues; the only time I haven't gotten new work seems to coincide with the same problems that effected everyone else. I do have concerns about a "acourbet.10.design_S" unit that is claiming large amounts of working memory (3.08GB right now, down from 3.2GB). It's been running about 4.5 hours, the last checkpoint was about 2.5 hours ago. It's an older machine with only 4GB of memory but as it's not doing anything else at the moment BOINC is allowed to use all of it. I'll keep an eye on it but thought a head's up might be in order. edited to add: this is a resend after the previous cruncher's effort ended with an "out of memory error". workunit Best, Snags ID: 80667 · Rating: 0 · rate: / Reply Quote