Message boards : Number crunching : Rosetta tasks not leaving memory
Author | Message |
---|---|
Jonathan Brier Send message Joined: 1 Dec 05 Posts: 12 Credit: 2,732,333 RAC: 0 |
I am on the support team for GridRepublic, Progress Thru Processors, and Charity Engine. Over the past few months we received a few support inquiries from a few Windows users regarding Rosetta tasks not leaving memory. Seeing this repeating it seems to be more than a fluke. Exiting BOINC does not always remove the processes from the task manager/memory. Restarting the computer is the only solution. Have any others seen this or know what might cause this? I know that Rosetta mini caused this a few times based on screenshots. The second issue we see often is the "exit with zero status" for Rosetta tasks in the account log. Normally resetting the project manually fixes this, but working to reduce this manual intervention is desired. It does appear others have reported this such as https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6110. Help on how to diagnose the cause or any help I can be to help track this down would be appreciated. GridRepublic - bringing BOINC mainstream: http://www.gridrepublic.org GridRepublic Fan Page: http://www.facebook.com/GridRepublic Progress Thru Processors Facebook: http://www.facebook.com/progressthruprocessors |
Keefy Send message Joined: 29 Mar 06 Posts: 1 Credit: 10,224,268 RAC: 0 |
I have started to notice something possibly similar, i have my boinc (i only process rosetta) set to suspend during office hours and to not remain in memory once suspended. Sometimes this works and other times not. The clue may be to do with checkpointing and duration. i exited the boinc manager and actually asked it to close and remove the apps from memory, when i restarted the boinc manager the tasks restarted processing from scratch as if there was (to my perspective) no checkpoint to continue from. Now, my tasks are set to run for 24 hours with, as i say, suspension during office hours, the only ones i have seen properly remove from memory when suspended, are those that have run longer than 8 and a bit to 9 hours. My local preferences say checkpoint at most every 60 seconds, i have upped that to 30 minutes but i don't know if this is a boinc or rosetta issue and what determines the checkpoint frequency. Assuming my observations are accurate that is. i haven't spent much time on this but I'm sure i found another thread that mentioned checkpointing as being the prerequisite before memory removal, i may be wrong. I have just checked the properties on a task that won't leave memory, it has run 8 hours and has NO value for cpu time at last checkpoint. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,400 |
I am on the support team for GridRepublic, Progress Thru Processors, and Charity Engine. I can't help with the first part but the second part WE THINK has been figured out. I am JUST a user, nothing more, so this is a 'fix' from a users perspective as Rosetta has said they are happy with the way things are. The problem seems to happen mostly with people who have Nvidia gpu drivers newer than 306.97 loaded and running on their pc. It is not happening with AMD gpu's as much anymore. It does NOT matter that Rosetta does not have a gpu application, it is just if the gpu drivers are loaded, then they interfere somehow. The gpu does NOT even have to be actively crunching, the units just crash. I am using Boinc version 7.0.40 or later on all of my machines, it used to be thought that ONLY the 6.?.? versions worked, but more and more people are finding the newer ones work too. In short the Project has done nothing to help with the situation, it has been going on for a L-O-N-G time, it has been thru trial and error that for some crunching for Rosetta is okay again. For others it is the constant 'exit with zero status' problem that the Project itself has repeatedly said 'it doesn't happen on the beta site, so it can't be Project related'! And they are correct, it DOESN'T happen on the beta site! But as I said rolling back the Nvidia drivers to 306.97 seems to fix the problem. EVEN if they are not crunching with their Nvidia gpu, the drivers just being loaded seems to be a problem! Laptops that have NEVER used the gpu have seen the problem, if they unload the drivers Rosetta runs and gets credits just fine. Reload the drivers and BOOM the same error you reported and no credits! |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 10,243 |
I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either: * Only later versions of BOINC (7+) * Only projects that are crunching with the GPU on other projects * Intermittent I'll try upgrading BOINC later too. Danny |
Jonathan Brier Send message Joined: 1 Dec 05 Posts: 12 Credit: 2,732,333 RAC: 0 |
I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either: Well the Rosetta tasks not leaving memory are on computers running a 6.x.x version of BOINC... the Charity Engine installer is different. The NVIDIA gpu driver is an interesting speculation and I will follow up with those who reported and see if they are all NVIDIA. Even if the issue is intermittent there is some underlying cause either BOINC, Rosetta, or other program or design causing these issues. Any other thoughts and testing results are appreciated. The discussion on this in Charity Engine is: http://www.charityengine.com/forum/show-topic/1109 and http://www.gridrepublic.org/joomla/index.php?option=com_smf&Itemid=26&topic=338.msg1365#new for GridRepublic/PTP. We are in email communication with the rosetta@home team. They are looking into this issue. GridRepublic - bringing BOINC mainstream: http://www.gridrepublic.org GridRepublic Fan Page: http://www.facebook.com/GridRepublic Progress Thru Processors Facebook: http://www.facebook.com/progressthruprocessors |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 10,243 |
David E K's post here might be relevant. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,169,305 RAC: 3,400 |
I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either: I have an AMD gpu in one of my machines crunching Moo units right now and am using Boinc version 7.0.45 and it is getting credits just fine. I am surprised to see you are using a newer than 306.97 driver and it is working though! Most people are not seeing that. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,627,225 RAC: 10,243 |
I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either: I think they are seeing the same as me if they're not running any GPU projects, which I'm not. I guess the next test should be for me to add a GPU project to that laptop - I'll try that later today. |
Message boards :
Number crunching :
Rosetta tasks not leaving memory
©2024 University of Washington
https://www.bakerlab.org