Need help

Message boards : GPU Users Group message board : Need help

To post messages, you must log in.

AuthorMessage
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 95
Credit: 289,903
RAC: 0
Message 92504 - Posted: 29 Mar 2020, 5:06:14 UTC
Last modified: 29 Mar 2020, 5:27:49 UTC

How do you control the amount of tasks and memory running that is allotted to Rosetta?
Right now it locks my host up as soon as I launch BOINC. I have no screen updating, no mouse movement or keyboard input. All I can do to reclaim the host is push the reset button.

The memory meter is pegged. From the forum posts it looks like each task takes a gig of memory. I only have 16GB of memory installed and it is launching 17 tasks.

[Edit]

Got it under control. I edited the global prefs and override files before I launched BOINC to prevent too many tasks from starting up and to limit the amount of memory.

The hard drive light just locked solid trying to swap out to swap space when BOINC loaded the tasks previously. I had my override settings set to run 18 Seti cpu tasks and that tried to run 18 Rosetta tasks with only 16Gb of memory. No good outcome there.
ID: 92504 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 92512 - Posted: 29 Mar 2020, 9:37:59 UTC - in response to Message 92504.  

How do you control the amount of tasks and memory running that is allotted to Rosetta?
Right now it locks my host up as soon as I launch BOINC. I have no screen updating, no mouse movement or keyboard input. All I can do to reclaim the host is push the reset button.

The memory meter is pegged. From the forum posts it looks like each task takes a gig of memory. I only have 16GB of memory installed and it is launching 17 tasks.

[Edit]

Got it under control. I edited the global prefs and override files before I launched BOINC to prevent too many tasks from starting up and to limit the amount of memory.

The hard drive light just locked solid trying to swap out to swap space when BOINC loaded the tasks previously. I had my override settings set to run 18 Seti cpu tasks and that tried to run 18 Rosetta tasks with only 16Gb of memory. No good outcome there.


To get the system so that it does not hang, I am still working through some of those issues. I have been instructed to make CPU utilization 100% as anything less has switching and there have been some issues with that. Rosetta also uses quite a bit of RAM so I have backed % of Cores back off to 60%. This is significantly less than what other projects seem to think is okay.

ID: 92512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Freewill

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 2,398
RAC: 0
Message 93691 - Posted: 6 Apr 2020, 22:51:11 UTC
Last modified: 6 Apr 2020, 23:00:50 UTC

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.
ID: 93691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93722 - Posted: 7 Apr 2020, 11:28:52 UTC - in response to Message 93691.  

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.


I believe you would change the coprocessor callout in cc_config to not use all.

ID: 93722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 10,597,198
RAC: 27,529
Message 93757 - Posted: 7 Apr 2020, 19:40:24 UTC - in response to Message 93691.  

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.


Sounds like another job for "max_concurrent_tasks" :)

I have tried running unlimited Rosetta but can't run anything else unless I restrict it to 8 (now 12) threads.

Tom
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 93757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93791 - Posted: 8 Apr 2020, 0:29:38 UTC - in response to Message 93757.  

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.


Sounds like another job for "max_concurrent_tasks" :)

I have tried running unlimited Rosetta but can't run anything else unless I restrict it to 8 (now 12) threads.

Tom


Does max concurrent takss callout go in pandora file?

ID: 93791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93797 - Posted: 8 Apr 2020, 0:52:29 UTC - in response to Message 93791.  

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.


Sounds like another job for "max_concurrent_tasks" :)

I have tried running unlimited Rosetta but can't run anything else unless I restrict it to 8 (now 12) threads.

Tom


Does max concurrent takss callout go in pandora file?


I found the thread over on seti and got the file created....works great.

ID: 93797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 10,597,198
RAC: 27,529
Message 93801 - Posted: 8 Apr 2020, 2:33:17 UTC - in response to Message 93791.  

Hi All,
I just got started over here and I have 6 Rosetta CPU jobs running, which is more than I want. How do I best cut it back to perhaps 4? I need the other cores to feed my GPUs for S@H.

*edit* Keith and Juan got me fixed. app_config.xml for the project.


Sounds like another job for "max_concurrent_tasks" :)

I have tried running unlimited Rosetta but can't run anything else unless I restrict it to 8 (now 12) threads.

Tom


Does max concurrent takss callout go in pandora file?


app_config.xml
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 93801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93848 - Posted: 8 Apr 2020, 10:42:06 UTC

I am having a strange issue. Since Seti is down for maintenance, i will post here. I have 2000+ seti GPU jobs but none of them are running. They only run when I suspend all other attached projects. I have app_configs for all projects to keep resources free for seti..... What is going on, any ideas?

ID: 93848 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 10,597,198
RAC: 27,529
Message 93854 - Posted: 8 Apr 2020, 11:34:25 UTC - in response to Message 93848.  

I am having a strange issue. Since Seti is down for maintenance, i will post here. I have 2000+ seti GPU jobs but none of them are running. They only run when I suspend all other attached projects. I have app_configs for all projects to keep resources free for seti..... What is going on, any ideas?


What resource settings are you using?

I have Seti@Home set for 1,000 while Rosetta@Home is set for 100.

R@H has a much earlier "due date" than S@H. The scheduler pays attention to those due dates.

So my guess is if you keep R@H at say 100 and set S@H at 1,000 it will work.

I also have a similar issue if I don't use "project_concurrent_max" in app_config.xml file on the R@H project. It starves everything including the gpu threads.

Tom
ID: 93854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93855 - Posted: 8 Apr 2020, 11:46:34 UTC - in response to Message 93854.  

I am having a strange issue. Since Seti is down for maintenance, i will post here. I have 2000+ seti GPU jobs but none of them are running. They only run when I suspend all other attached projects. I have app_configs for all projects to keep resources free for seti..... What is going on, any ideas?


What resource settings are you using?

I have Seti@Home set for 1,000 while Rosetta@Home is set for 100.

R@H has a much earlier "due date" than S@H. The scheduler pays attention to those due dates.

So my guess is if you keep R@H at say 100 and set S@H at 1,000 it will work.

I also have a similar issue if I don't use "project_concurrent_max" in app_config.xml file on the R@H project. It starves everything including the gpu threads.

Tom


The only Seti jobs I have are GPU which Rosetta is not using so they should not be in conflict. I have Seti set at 5000 resource share and everything else 500, still no dice. I have project concurrent max also set so resources are free but once again my GPUs are essentially idle and not doing anything even though seti has jobs sitting in cache. The Seti jobs start if I suspend all other projects though.....

ID: 93855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 10,597,198
RAC: 27,529
Message 93858 - Posted: 8 Apr 2020, 12:02:48 UTC - in response to Message 93855.  
Last modified: 8 Apr 2020, 12:11:40 UTC

I am having a strange issue. Since Seti is down for maintenance, i will post here. I have 2000+ seti GPU jobs but none of them are running. They only run when I suspend all other attached projects. I have app_configs for all projects to keep resources free for seti..... What is going on, any ideas?


What resource settings are you using?

I have Seti@Home set for 1,000 while Rosetta@Home is set for 100.

R@H has a much earlier "due date" than S@H. The scheduler pays attention to those due dates.

So my guess is if you keep R@H at say 100 and set S@H at 1,000 it will work.

I also have a similar issue if I don't use "project_concurrent_max" in app_config.xml file on the R@H project. It starves everything including the gpu threads.

Tom


The only Seti jobs I have are GPU which Rosetta is not using so they should not be in conflict. I have Seti set at 5000 resource share and everything else 500, still no dice. I have project concurrent max also set so resources are free but once again my GPUs are essentially idle and not doing anything even though seti has jobs sitting in cache. The Seti jobs start if I suspend all other projects though.....


I would set the R@H to 100 then and make sure the R@H is not using more than 80% of your available cores/threads through the max project limit. Do a hand count to make sure your R@H are not exceeding your project_max limits while setting your cores/threads to free up at least 4 cores/threads in the Boinc Manager. I ended up at 87.5% on my 32 thread box to get down to 28 threads.

Don't forget to run the read config files in the Boinc Manager after each change in your app_config.xml file.

If you are using Pandora I plead complete ignorance but assume that shutting down the the Boinc Manager/Tasks, waiting 15 seconds and re-starting will deal with getting a fresh set of settings.

I think you are experiencing not exactly a conflict but "competition for resources". When R@H is pushing the limits I get the same "gpu not running" issue you are getting (with Einstein@Home since I have run out of S@H tasks).

Also check your notices after you restart Boinc Manager. If there is a syntax error (yes I still get them) in your app_config.xml file it will notify you there.

My App_config.xml file is:
<app_config>
<project_max_concurrent>17</project_max_concurrent>
</app_config>


Tom
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 93858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Freewill

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 2,398
RAC: 0
Message 93859 - Posted: 8 Apr 2020, 12:09:42 UTC - in response to Message 93855.  

This was the problem I had when Rosetta was using up all my available CPU cores. It stopped one of my GPUs from running since there was no available CPU core to support it. I had to put an app_config file in the Rosetta folder with the following:

<app_config>
<app>
    <name>rosetta</name>
</app>
<app>
    <name>minirosetta</name>
</app>
<project_max_concurrent>4</project_max_concurrent>
</app_config>

(change the 4 as needed to leave some cores for S@H) Thanks to Keith and Juan for this.
ID: 93859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93860 - Posted: 8 Apr 2020, 12:17:23 UTC - in response to Message 93858.  

I am having a strange issue. Since Seti is down for maintenance, i will post here. I have 2000+ seti GPU jobs but none of them are running. They only run when I suspend all other attached projects. I have app_configs for all projects to keep resources free for seti..... What is going on, any ideas?


What resource settings are you using?

I have Seti@Home set for 1,000 while Rosetta@Home is set for 100.

R@H has a much earlier "due date" than S@H. The scheduler pays attention to those due dates.

So my guess is if you keep R@H at say 100 and set S@H at 1,000 it will work.

I also have a similar issue if I don't use "project_concurrent_max" in app_config.xml file on the R@H project. It starves everything including the gpu threads.

Tom


The only Seti jobs I have are GPU which Rosetta is not using so they should not be in conflict. I have Seti set at 5000 resource share and everything else 500, still no dice. I have project concurrent max also set so resources are free but once again my GPUs are essentially idle and not doing anything even though seti has jobs sitting in cache. The Seti jobs start if I suspend all other projects though.....


I would set the R@H to 100 then and make sure the R@H is not using more than 80% of your available cores/threads through the max project limit. Do a hand count to make sure your R@H are not exceeding your project_max limits while setting your cores/threads to free up at least 4 cores/threads in the Boinc Manager. I ended up at 87.5% on my 32 thread box to get down to 28 threads.

Don't forget to run the read config files in the Boinc Manager after each change in your app_config.xml file.

If you are using Pandora I plead complete ignorance but assume that shutting down the the Boinc Manager/Tasks, waiting 15 seconds and re-starting will deal with getting a fresh set of settings.

I think you are experiencing not exactly a conflict but "competition for resources". When R@H is pushing the limits I get the same "gpu not running" issue you are getting (with Einstein@Home since I have run out of S@H tasks).

Also check your notices after you restart Boinc Manager. If there is a syntax error (yes I still get them) in your app_config.xml file it will notify you there.

My App_config.xml file is:
<app_config>
<project_max_concurrent>17</project_max_concurrent>
</app_config>


Tom


I will check when i get home. Odd thing is that GPUGrid will run GPU jobs if I unsuspend it but Seti will not run on those same GPUs. I did not realize that Seti GPU had a CPU requirement, I should have cores free but will double check.

ID: 93860 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckeye4lf
Avatar

Send message
Joined: 29 Aug 08
Posts: 43
Credit: 8,323,412
RAC: 7,682
Message 93861 - Posted: 8 Apr 2020, 12:18:22 UTC - in response to Message 93859.  

This was the problem I had when Rosetta was using up all my available CPU cores. It stopped one of my GPUs from running since there was no available CPU core to support it. I had to put an app_config file in the Rosetta folder with the following:

<app_config>
<app>
    <name>rosetta</name>
</app>
<app>
    <name>minirosetta</name>
</app>
<project_max_concurrent>4</project_max_concurrent>
</app_config>

(change the 4 as needed to leave some cores for S@H) Thanks to Keith and Juan for this.


Does it need one CPU per GPU job? I did not realize it needed a CPU core

ID: 93861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 10,597,198
RAC: 27,529
Message 93862 - Posted: 8 Apr 2020, 12:55:51 UTC - in response to Message 93861.  

This was the problem I had when Rosetta was using up all my available CPU cores. It stopped one of my GPUs from running since there was no available CPU core to support it. I had to put an app_config file in the Rosetta folder with the following:

<app_config>
<app>
    <name>rosetta</name>
</app>
<app>
    <name>minirosetta</name>
</app>
<project_max_concurrent>4</project_max_concurrent>
</app_config>

(change the 4 as needed to leave some cores for S@H) Thanks to Keith and Juan for this.


Does it need one CPU per GPU job? I did not realize it needed a CPU core


Here is my app_config.xml file for Seti@Home
app_config>
<project_max_concurrent>14</project_max_concurrent>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config>


The default for cpu usage is either 0.1 or less for S@H. If you are using Tbar/petri's All in One with the "-nobs" enabled you will get your best performance with the above app_config.xml file.

A Threadripper 3970x with 64 threads/cores should have a Boinc Manager setting of 93.75% to keep 4 threads free for "housekeeping".
If you have 3 gpus you should be running at most 57 cores/threads in your project max for the Rosetta@Home folder.
You might try dropping the max concurrent in Rosetti@Home to as low as 48. If you gpus suddenly start working.... then you can work your way back up to 57 .

I hope Freewill's and my advice will get your system back on track with maximum Rosetti@Home crunching while you finish out your S@H tasks.

Tom M
If you are running spoofed gpus it gets a bit more complicated.
ID: 93862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Freewill

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 2,398
RAC: 0
Message 93886 - Posted: 8 Apr 2020, 17:28:07 UTC - in response to Message 93861.  

In general, yes, S@H on a GPU needs 1 core per GPU for the special sauce and mutex versions. At least my set up does.
ID: 93886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : GPU Users Group message board : Need help



©2024 University of Washington
https://www.bakerlab.org