Problems and Technical Issues with Rosetta@home

Author	Message
JRP2706 Send message Joined: 19 Mar 13 Posts: 2 Credit: 301,174 RAC: 0	Message 75445 - Posted: 24 Apr 2013, 11:09:02 UTC Just experienced the CASP9....... uploading problem. With all the issues being reported I am also suspending work here until some progress is made to resolve them or at least some formal acknowledgement is made by the Moderators/Support staff. So Long, and Thanks for all the fish, jrp2706 ID: 75445 · Rating: 0 · rate: /

morgan Send message Joined: 30 Jun 06 Posts: 3 Credit: 387,964 RAC: 0	Message 75446 - Posted: 24 Apr 2013, 11:39:06 UTC - in response to Message 75445. Just experienced the CASP9....... uploading problem. jrp2706 CASP9 And ActCys waiting for upload here ID: 75446 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75447 - Posted: 24 Apr 2013, 11:54:46 UTC Last modified: 24 Apr 2013, 11:55:16 UTC I have several other types that won't upload. The problems are big but as usual no news. My computers will run other projects with medical programs this week. Greetings, TJ. ID: 75447 · Rating: 0 · rate: /

Josh and Amanda Send message Joined: 20 Oct 11 Posts: 1 Credit: 8,591,642 RAC: 0	Message 75449 - Posted: 24 Apr 2013, 14:21:13 UTC As a result of the recent myriad of issues with this project compounded by little/no action by Team Baker I will by indefinitely suspending my project time on Rosetta. When your team can respect my time and resources and monitor the program effectively I may consider allowing new tasks, as for now project Collatz thanks you for my additional CPU cycles... ID: 75449 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75451 - Posted: 24 Apr 2013, 16:41:26 UTC Last modified: 24 Apr 2013, 16:44:30 UTC As the problems at the project persist without any but a general note that there is an awareness of 'problems' -- and not specifics, I think there is an inclination of a pattern or sequence of responses. 1) Bring to the attention of the project (via this site) each of the several specific problems encountered. 2) Mitigate the problem at the workstation (kill off known 'bad boy' work units. 3) Verify the project has acknowledged the specific issue (not done yet) 4) Lacking acknowledgement of a problem - move to a 'no new work' posture, while awaiting project acknowledgement, resolution and explanation. 5) Should other problems surface (failed uploads, failed downloads, delayed validation) note these problems as they occur. 6) Verify the project has acknowledged the specific issues (not done yet). 7) Lacking acknowledgement of specific multiple problems let along lacking a resolution of any of the problems (and ideally explanation) -- move to a 'suspend project processing' posture while awaiting project acknowledgement, resolution and explanation. 8) Allow some time to pass (days) for the project to do what it should be doing to resolve problems (assuming the data being generated is still of value to the project). 9) After the passage of time with no specific acknowledgement, and time frame for resolution, consider detatching from project (killing of possible good work units which get sent back into the queue -- perhaps with new due dates). Currently, I submit we are in stage 8 here.... I'd note that aside from this thread, and one on computational errors, there has been nothing in the way of a response or acknowledgement of the reported issues by actual project people (some from volunteers who note the project 'is aware') and nothing on the home page to alert folks who are less proactive about a project bumping into issues. ID: 75451 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75456 - Posted: 24 Apr 2013, 21:48:32 UTC - in response to Message 75451. Update -- so the project took the server fully offline (no notice -- no surprise). The presumption was they were working to correct things. Then again, no notice of what they did. Server now reports all green. Great. Uploads don't go through. Not so great. I get the feeling that there may be an element of random going on in the troubleshooting efforts. At least the explanations are not random, they are instead nonexistent. Awaiting developments. As the problems at the project persist without any but a general note that there is an awareness of 'problems' -- and not specifics, I think there is an inclination of a pattern or sequence of responses. 1) Bring to the attention of the project (via this site) each of the several specific problems encountered. 2) Mitigate the problem at the workstation (kill off known 'bad boy' work units. 3) Verify the project has acknowledged the specific issue (not done yet) 4) Lacking acknowledgement of a problem - move to a 'no new work' posture, while awaiting project acknowledgement, resolution and explanation. 5) Should other problems surface (failed uploads, failed downloads, delayed validation) note these problems as they occur. 6) Verify the project has acknowledged the specific issues (not done yet). 7) Lacking acknowledgement of specific multiple problems let along lacking a resolution of any of the problems (and ideally explanation) -- move to a 'suspend project processing' posture while awaiting project acknowledgement, resolution and explanation. 8) Allow some time to pass (days) for the project to do what it should be doing to resolve problems (assuming the data being generated is still of value to the project). 9) After the passage of time with no specific acknowledgement, and time frame for resolution, consider detatching from project (killing of possible good work units which get sent back into the queue -- perhaps with new due dates). Currently, I submit we are in stage 8 here.... I'd note that aside from this thread, and one on computational errors, there has been nothing in the way of a response or acknowledgement of the reported issues by actual project people (some from volunteers who note the project 'is aware') and nothing on the home page to alert folks who are less proactive about a project bumping into issues. ID: 75456 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75460 - Posted: 24 Apr 2013, 22:30:04 UTC Last modified: 24 Apr 2013, 22:31:07 UTC Long time ago I read somewhere in these fora that they work with students. I guess these students are still in the learning process of computer science and if there is no other help then it will be a process of trial and error. I have seen this with other projects too, then when a post graduate student becomes a PhD, leaves the project and the expertise is gone. Perhaps that is the case here as well. And courses communication are off the curriculum in the US...??? Docking@home is getting my cpu cycles. Greetings, TJ. ID: 75460 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75461 - Posted: 24 Apr 2013, 22:37:07 UTC Last modified: 24 Apr 2013, 22:38:08 UTC Impatient as I am, I did one more " retry now" for the files waiting uploading, and guess what, yes, they flow trough the fiber and copper wires immediately. It is working again? The "pending" are still there pending... So not all is working again... Greetings, TJ. ID: 75461 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75463 - Posted: 24 Apr 2013, 22:52:22 UTC - in response to Message 75461. Last modified: 24 Apr 2013, 22:54:11 UTC Thanks for that -- uploads are going through now. That let's me mark the CASP and Cryo units as bad, stay with no new work and let other work units process. I'd note that unlike the Cryo units, the CASP units might be OK on Windows 7 systems and when they fail they do fail quickly (unlike the Cryo units). As to pendings -- looks like they just cleared as well. Now as to the Cryo and CASP workunits.... It is all supposition though as we remain in something of an informational black hole... Impatient as I am, I did one more " retry now" for the files waiting uploading, and guess what, yes, they flow trough the fiber and copper wires immediately. It is working again? The "pending" are still there pending... So not all is working again... ID: 75463 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75464 - Posted: 24 Apr 2013, 23:00:03 UTC Indeed the pendings are gone. As the Docking Wu's are finished I will start Rosie again. See if I can get 1 million before the weekend. Happy crunching for the good cause. Greetings, TJ. ID: 75464 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75465 - Posted: 24 Apr 2013, 23:08:57 UTC - in response to Message 75464. So, from the update now on the home page (thanks for that), the network specific issues (which showed up as pendings and upload/download problems) were 'resolved' via a reboot. No comment about the Cryo and Casp work unit issues. I've elected for no new work, and have killed off any Cryo units. As to the Casp work units -- it seems most complete properly and from what I could see, when they fail they fail in under 5 minutes so I'll let them process. But until I see an assessment of both the Cryo and Casp work units at the project level, I think I'll hang back with no new work and simply clear queues. Indeed the pendings are gone. As the Docking Wu's are finished I will start Rosie again. See if I can get 1 million before the weekend. Happy crunching for the good cause. ID: 75465 · Rating: 0 · rate: /

Yifan Song Volunteer moderator Project developer Project scientist Send message Joined: 26 May 09 Posts: 62 Credit: 7,322 RAC: 0	Message 75466 - Posted: 25 Apr 2013, 0:48:59 UTC Hi, I'm sorry for causing all the trouble with my cryo work units. The crashes are related to using electron density data. I'm updating r@h with bug fixes that should make these jobs more stable. Yifan ID: 75466 · Rating: 0 · rate: /

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 75467 - Posted: 25 Apr 2013, 1:42:00 UTC All good now. Switching back to Rosetta. ID: 75467 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75468 - Posted: 25 Apr 2013, 4:00:55 UTC - in response to Message 75466. Thanks for the message -- more than anything the angst is a function of information flow. Hang around so you can catch updated reports over the next few days regarding the updated cryo work units. To confirm, the updates are in place and any cryo units that are received should be ok -- correct?? Hi, I'm sorry for causing all the trouble with my cryo work units. The crashes are related to using electron density data. I'm updating r@h with bug fixes that should make these jobs more stable. Yifan ID: 75468 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75470 - Posted: 25 Apr 2013, 9:21:26 UTC - in response to Message 75468. To confirm, the updates are in place and any cryo units that are received should be ok -- correct?? I think not, got one cryo and that one errored out quickly. But it could be an old one. So at first I will not directly abort any cryo's. Greetings, TJ. ID: 75470 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75471 - Posted: 25 Apr 2013, 11:25:34 UTC - in response to Message 75470. To confirm, the updates are in place and any cryo units that are received should be ok -- correct?? I think not, got one cryo and that one errored out quickly. But it could be an old one. So at first I will not directly abort any cryo's. I aborted all units in my cache, I only keep a single day one, and have moved all my pc's elsewhere. I will come back, I still have a goal to get, but not until after Rosetta figures things out. There are waaay too many other places to crunch for to waste time crunching for a project that takes DAYS to say and do anything after we, the crunchers, find problems! Dr. Yifan Song came on and said 'worry' but people are STILL having problems! Why wasn't EVERY cryo unit pulled and sent over to Albert for re-testing? Oh I know...they want us to crunch thru and have problems with all the 'old' units just in case one or two will work!! NO, NO, NO!!! That is NOT the way to engender confidence in your worker bees!!! Send them crap hoping that maybe they either won't care or will find a gem or two in the ton of crap you send out!! AT LEAST the good Doctor came on and SAID SOMETHING, but obviously the problems STILL exist!! Units get sent to people and then get aborted by the project ALL THE TIME, usually over deadline issues, but for other reasons too! WHY wasn't EVERY cryo unit aborted and updated to the new criteria and then sent for Beta testing? Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!! ID: 75471 · Rating: 0 · rate: /

Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0	Message 75475 - Posted: 25 Apr 2013, 19:03:01 UTC - in response to Message 75471. Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!! I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me. ID: 75475 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 75477 - Posted: 25 Apr 2013, 19:59:30 UTC - in response to Message 75475. Last modified: 25 Apr 2013, 20:09:00 UTC Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!! I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me. I've recently sent some comments to boinc_dev about a problem with the way BOINC keeps track of the amount of memory in use, especially under Windows Vista. For 32-bit workunits, it does not count the SYSWOW64 modules needed to run those workunits under 64-bit Windows. A few possible ways to handle this, at least partially: Wait for a future version of BOINC that does count them, and offers separate memory limits for 32-bit memory space and for the entire 64-bit memory space BOINC uses for all workunits. Set each of your computers to subscribe only to BOINC projects that offer only 32-bit workunits, or only 64-bit workunits, but not both on the same computer. Upgrade your Windows Vista computers to Windows 7, where the SYSWOW64 modules are much smaller. I don't know how this applies to 64-bit Windows XP or 64-bit Windows 8. Windows Vista uses roughly the same amount of memory for the 32-bit workunits and for the SYSWOW64 modules needed to run them. Persuade all BOINC projects to either offer a true 64-bit version of each of their applications (even if it won't run any faster than the 32-bit version), or double the estimates of required memory for all 32-bit workunits sent to 64-bit versions of BOINC. 64-bit applications don't use any SYSWOW64 modules when they run, and therefore don't need any memory space to load them. ID: 75477 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75482 - Posted: 26 Apr 2013, 10:51:48 UTC - in response to Message 75477. Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!! I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me. I've recently sent some comments to boinc_dev about a problem with the way BOINC keeps track of the amount of memory in use, especially under Windows Vista. For 32-bit workunits, it does not count the SYSWOW64 modules needed to run those workunits under 64-bit Windows. A few possible ways to handle this, at least partially: Wait for a future version of BOINC that does count them, and offers separate memory limits for 32-bit memory space and for the entire 64-bit memory space BOINC uses for all workunits. Set each of your computers to subscribe only to BOINC projects that offer only 32-bit workunits, or only 64-bit workunits, but not both on the same computer. Upgrade your Windows Vista computers to Windows 7, where the SYSWOW64 modules are much smaller. I don't know how this applies to 64-bit Windows XP or 64-bit Windows 8. Windows Vista uses roughly the same amount of memory for the 32-bit workunits and for the SYSWOW64 modules needed to run them. Persuade all BOINC projects to either offer a true 64-bit version of each of their applications (even if it won't run any faster than the 32-bit version), or double the estimates of required memory for all 32-bit workunits sent to 64-bit versions of BOINC. 64-bit applications don't use any SYSWOW64 modules when they run, and therefore don't need any memory space to load them. I am using 64bit Win7 Ultimate on all of my Rosetta machines, so that isn't really an issue for me, and I still never crunched a cryo unit successfully! ID: 75482 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Jun 08 Posts: 1265 Credit: 14,424,358 RAC: 0	Message 75485 - Posted: 26 Apr 2013, 14:24:29 UTC - in response to Message 75482. Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!! I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me. I've recently sent some comments to boinc_dev about a problem with the way BOINC keeps track of the amount of memory in use, especially under Windows Vista. For 32-bit workunits, it does not count the SYSWOW64 modules needed to run those workunits under 64-bit Windows. A few possible ways to handle this, at least partially: Wait for a future version of BOINC that does count them, and offers separate memory limits for 32-bit memory space and for the entire 64-bit memory space BOINC uses for all workunits. Set each of your computers to subscribe only to BOINC projects that offer only 32-bit workunits, or only 64-bit workunits, but not both on the same computer. Upgrade your Windows Vista computers to Windows 7, where the SYSWOW64 modules are much smaller. I don't know how this applies to 64-bit Windows XP or 64-bit Windows 8. Windows Vista uses roughly the same amount of memory for the 32-bit workunits and for the SYSWOW64 modules needed to run them. Persuade all BOINC projects to either offer a true 64-bit version of each of their applications (even if it won't run any faster than the 32-bit version), or double the estimates of required memory for all 32-bit workunits sent to 64-bit versions of BOINC. 64-bit applications don't use any SYSWOW64 modules when they run, and therefore don't need any memory space to load them. I am using 64bit Win7 Ultimate on all of my Rosetta machines, so that isn't really an issue for me, and I still never crunched a cryo unit successfully! It's still an issue for Win7, although less than for WinVista, as long as the Rosetta@Home server keeps sending 32-bit workunits to 64-bit versions of Windows. You might check if your motherboard is able to hold any more memory, and if so, try installing more memory. With 16 GB, my Win7 computer is at least fast at failing cryo workunits. ID: 75485 · Rating: 0 · rate: /