Rosetta tasks not leaving memory

Message boards : Number crunching : Rosetta tasks not leaving memory

To post messages, you must log in.

AuthorMessage
Profile Jonathan Brier
Avatar

Send message
Joined: 1 Dec 05
Posts: 12
Credit: 2,732,077
RAC: 0
Message 74923 - Posted: 18 Jan 2013, 7:15:33 UTC

I am on the support team for GridRepublic, Progress Thru Processors, and Charity Engine.

Over the past few months we received a few support inquiries from a few Windows users regarding Rosetta tasks not leaving memory. Seeing this repeating it seems to be more than a fluke. Exiting BOINC does not always remove the processes from the task manager/memory. Restarting the computer is the only solution. Have any others seen this or know what might cause this? I know that Rosetta mini caused this a few times based on screenshots.

The second issue we see often is the "exit with zero status" for Rosetta tasks in the account log. Normally resetting the project manually fixes this, but working to reduce this manual intervention is desired. It does appear others have reported this such as https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6110.

Help on how to diagnose the cause or any help I can be to help track this down would be appreciated.
GridRepublic - bringing BOINC mainstream: http://www.gridrepublic.org

GridRepublic Fan Page: http://www.facebook.com/GridRepublic

Progress Thru Processors Facebook: http://www.facebook.com/progressthruprocessors
ID: 74923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keefy

Send message
Joined: 29 Mar 06
Posts: 1
Credit: 10,224,268
RAC: 0
Message 74924 - Posted: 18 Jan 2013, 11:29:14 UTC - in response to Message 74923.  
Last modified: 18 Jan 2013, 11:45:39 UTC

I have started to notice something possibly similar, i have my boinc (i only process rosetta) set to suspend during office hours and to not remain in memory once suspended. Sometimes this works and other times not.
The clue may be to do with checkpointing and duration. i exited the boinc manager and actually asked it to close and remove the apps from memory, when i restarted the boinc manager the tasks restarted processing from scratch as if there was (to my perspective) no checkpoint to continue from.
Now, my tasks are set to run for 24 hours with, as i say, suspension during office hours, the only ones i have seen properly remove from memory when suspended, are those that have run longer than 8 and a bit to 9 hours.
My local preferences say checkpoint at most every 60 seconds, i have upped that to 30 minutes but i don't know if this is a boinc or rosetta issue and what determines the checkpoint frequency.
Assuming my observations are accurate that is. i haven't spent much time on this but I'm sure i found another thread that mentioned checkpointing as being the prerequisite before memory removal, i may be wrong.
I have just checked the properties on a task that won't leave memory, it has run 8 hours and has NO value for cpu time at last checkpoint.
ID: 74924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 74925 - Posted: 18 Jan 2013, 12:28:39 UTC - in response to Message 74923.  

I am on the support team for GridRepublic, Progress Thru Processors, and Charity Engine.

Over the past few months we received a few support inquiries from a few Windows users regarding Rosetta tasks not leaving memory. Seeing this repeating it seems to be more than a fluke. Exiting BOINC does not always remove the processes from the task manager/memory. Restarting the computer is the only solution. Have any others seen this or know what might cause this? I know that Rosetta mini caused this a few times based on screenshots.

The second issue we see often is the "exit with zero status" for Rosetta tasks in the account log. Normally resetting the project manually fixes this, but working to reduce this manual intervention is desired. It does appear others have reported this such as https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6110.

Help on how to diagnose the cause or any help I can be to help track this down would be appreciated.


I can't help with the first part but the second part WE THINK has been figured out. I am JUST a user, nothing more, so this is a 'fix' from a users perspective as Rosetta has said they are happy with the way things are. The problem seems to happen mostly with people who have Nvidia gpu drivers newer than 306.97 loaded and running on their pc. It is not happening with AMD gpu's as much anymore. It does NOT matter that Rosetta does not have a gpu application, it is just if the gpu drivers are loaded, then they interfere somehow. The gpu does NOT even have to be actively crunching, the units just crash. I am using Boinc version 7.0.40 or later on all of my machines, it used to be thought that ONLY the 6.?.? versions worked, but more and more people are finding the newer ones work too.

In short the Project has done nothing to help with the situation, it has been going on for a L-O-N-G time, it has been thru trial and error that for some crunching for Rosetta is okay again. For others it is the constant 'exit with zero status' problem that the Project itself has repeatedly said 'it doesn't happen on the beta site, so it can't be Project related'! And they are correct, it DOESN'T happen on the beta site! But as I said rolling back the Nvidia drivers to 306.97 seems to fix the problem. EVEN if they are not crunching with their Nvidia gpu, the drivers just being loaded seems to be a problem! Laptops that have NEVER used the gpu have seen the problem, if they unload the drivers Rosetta runs and gets credits just fine. Reload the drivers and BOOM the same error you reported and no credits!
ID: 74925 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,422,537
RAC: 53,087
Message 74927 - Posted: 18 Jan 2013, 12:46:47 UTC - in response to Message 74925.  

I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either:

* Only later versions of BOINC (7+)
* Only projects that are crunching with the GPU on other projects
* Intermittent

I'll try upgrading BOINC later too.

Danny

ID: 74927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jonathan Brier
Avatar

Send message
Joined: 1 Dec 05
Posts: 12
Credit: 2,732,077
RAC: 0
Message 75001 - Posted: 27 Jan 2013, 23:09:57 UTC - in response to Message 74927.  

I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either:

* Only later versions of BOINC (7+)
* Only projects that are crunching with the GPU on other projects
* Intermittent

I'll try upgrading BOINC later too.

Danny


Well the Rosetta tasks not leaving memory are on computers running a 6.x.x version of BOINC... the Charity Engine installer is different.

The NVIDIA gpu driver is an interesting speculation and I will follow up with those who reported and see if they are all NVIDIA.

Even if the issue is intermittent there is some underlying cause either BOINC, Rosetta, or other program or design causing these issues.

Any other thoughts and testing results are appreciated. The discussion on this in Charity Engine is: http://www.charityengine.com/forum/show-topic/1109 and http://www.gridrepublic.org/joomla/index.php?option=com_smf&Itemid=26&topic=338.msg1365#new for GridRepublic/PTP.

We are in email communication with the rosetta@home team. They are looking into this issue.
GridRepublic - bringing BOINC mainstream: http://www.gridrepublic.org

GridRepublic Fan Page: http://www.facebook.com/GridRepublic

Progress Thru Processors Facebook: http://www.facebook.com/progressthruprocessors
ID: 75001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,422,537
RAC: 53,087
Message 75003 - Posted: 28 Jan 2013, 10:09:40 UTC

David E K's post here might be relevant.
ID: 75003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 12,464
Message 75005 - Posted: 28 Jan 2013, 12:20:18 UTC - in response to Message 74927.  

I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either:

* Only later versions of BOINC (7+)
* Only projects that are crunching with the GPU on other projects
* Intermittent

I'll try upgrading BOINC later too.

Danny


I have an AMD gpu in one of my machines crunching Moo units right now and am using Boinc version 7.0.45 and it is getting credits just fine. I am surprised to see you are using a newer than 306.97 driver and it is working though! Most people are not seeing that.
ID: 75005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,422,537
RAC: 53,087
Message 75008 - Posted: 28 Jan 2013, 14:32:19 UTC - in response to Message 75005.  

I've upgraded the driver on my laptop to 310.90 (Optimus M310 GPU) on BOINC 6.12.34 and don't have a problem with Rosetta tasks, so that would suggest it's either:

* Only later versions of BOINC (7+)
* Only projects that are crunching with the GPU on other projects
* Intermittent

I'll try upgrading BOINC later too.

Danny


I have an AMD gpu in one of my machines crunching Moo units right now and am using Boinc version 7.0.45 and it is getting credits just fine. I am surprised to see you are using a newer than 306.97 driver and it is working though! Most people are not seeing that.

I think they are seeing the same as me if they're not running any GPU projects, which I'm not. I guess the next test should be for me to add a GPU project to that laptop - I'll try that later today.
ID: 75008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Rosetta tasks not leaving memory



©2024 University of Washington
https://www.bakerlab.org