Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 309 · Next
Author | Message |
---|---|
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time. However recently, I was getting messages from Rosetta that I needed to detach from Rosetta@Home and reattach to URL https://boinc.bakerlab.org/rosetta/. Prior version of rosetta cross-connected somehow? Hadn't run rosetta since early spring. Couldn't resolve this until I deleted all references to Rosetta in my BOINC data directory. Re-added Rosetta in BOINC Manager. Now rosetta runs multiple tasks while only 1 of my GPU task can run at a time. I see many msgs asking for control of rosetta for similar resource issues. Sad that my PCs with 6 or 8 cores can not use 3 for GPUs and the rest for rosetta? Guess Rosetta is off the list for me if this can't be resolved. At the rosetta page, I actually set the project resource percent usage "preference" down to .001. This shows in BOINC manager, but sems to do nothing? I have 1 4-core processor machine with 1 GPU that I am running rosetta on, since no problem there. Solution needed in application configuration! Tired of mucking around with preferences all over with no results! |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Running multiple GPUS (3 on each) on 2 PCs. In the past, was able to run Rosetta on those PCs with no issues. Ran all 3 GPUs and 3 rosetta tasks at a time. I've had somewhat similar problems with Folding@home using the GPU (only one) on my computer. I determined how many CPU cores Folding@home uses, then subtracted that number from the number of virtual cores BOINC is allowed to use. One more subtracted so I can do operations on the console. Setting the preference low only lowers the percentage of CPU-only work for Rosetta@home, and only if CPU tasks from some other BOINC are available. It has no effect on GPU tasks. The https URL is due to Rosetta@home switching to a more secure method of file exchange. Some of the older versions of BOINC cannot handle this properly, so which version are you using on the computers with and without the problem? Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
I actually set the project resource percent usage "preference" down to .001.What value are you referring to there? The Resource share setting is a ratio, not a percentage. And it is a longer term setting (not a short term one) for working out the balance of work between projects. If you want the work split evenly between projects, then just leave the Resource Share value for each project at the default value of 100. Solution needed in application configuration!Solution lies in using app_config.xml to reserve a CPU core to support your GPU for the project that uses the GPU. Although looking at the processing times for the Rosetta work you have returned, the system's are only slightly over committed. Reserving 1 CPU core for 2 or even 3 GPUs should be good enough. If you chose to use some configuration settings to get the most from your GPUs then it will be necessary to reserve a CPU core/thread per GPU. From the Collatz forums <app_config> <app> <name>collatz_sieve</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.3</cpu_usage> </gpu_versions> </app> </app_config>change cpu_usage value to 1 for 1 CPU core/thread per GPU. BOINC Manager, Options, Read config files for it to take effect (make sure it's in the project directory). https://boinc.thesonntags.com/collatz/forum_thread.php?id=168 Tired of mucking around with preferences all over with no results!It's the mucking around with the preferences that is causing most of your issues. That, combined with somehow you created a new machine id for your FX-8320 E, so that is starting from scratch & it would have taken several days to settle down as it worked out how much Rosetta & Collatz work it needed to do over a day to meet your Resource share settings. The fact you've been changing things randomly means it will take even longer for things to settle down- in accordance which whatever new settings you have selected. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Note that rather few people run with BOINC using more than one GPU on the same computer, so it could be a problem seen only only on computers where BOINC uses more than one GPU.At Seti there were many people running multi GPU setups with no issues- as long as they reserved as many CPU cores as were needed by the application. For the stock application that was 1 CPU core per GPU Task running, particularly so if they used optimised settings. For the Linux special application a single CPU core was able to handle multiple GPUs. Grant Darwin NT |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
What messages do you get in the Event log when Tasks start running, and others suspending? There has recently been a batch of Rosetta Tasks where a single Task can use as much as 5GB of RAM. With your limited system RAM, one of those Tasks with a couple of normal RAM requirement Tasks would result in other Tasks pausing due to a lack of RAM. Once completed, then the other Tasks would start back up again. Once all the large RAM Tasks are done, then things should run as they did previously (if all changes have been reverted to their original settings). Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
I wonder what happened with these two Tasks? Both marked as Invalid. dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1020463_1271_0 <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_122C3D_01_122_r2_127_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1411607 Using database: database_357d5d93529_n_methylminirosetta_database ====================================================== DONE :: 1 starting structures 28802 cpu seconds This process generated 98 decoys from 98 attempts ====================================================== BOINC :: WS_max 4.74567e+08 09:56:25 (1888): called boinc_finish(0) </stderr_txt> ]]> dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_abinitio_SAVE_ALL_OUT_1021102_1270_0 <core_client_version>7.6.33</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -beta -frag3 00001.200.3mers -frag9 00001.200.9mers -abinitio::increase_cycles 10 -mute all -abinitio::fastrelax -relax::default_repeats 5 -abinitio::rsd_wt_helix 0.5 -abinitio::rsd_wt_loop 0.5 -abinitio::use_filters false -ex1 -ex2aro -in:file:boinc_wu_zip dt_201104_hallucinated_C3D_01_129C3D_01_129_r2_36_model_fd_chA_fragments_fold_data.zip -abinitio::rg_reweight 0.5 -out:file:silent default.out -silent_gz -mute all -in:file:native 00001.pdb -out:file:silent_struct_type binary -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3454571 Using database: database_357d5d93529_n_methylminirosetta_database ====================================================== DONE :: 1 starting structures 28653.6 cpu seconds This process generated 95 decoys from 95 attempts ====================================================== BOINC :: WS_max 4.65773e+08 09:14:35 (7324): called boinc_finish(0) </stderr_txt> ]]> Grant Darwin NT |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Update on this issue. The app_config.xml for Collatz project did not solve the problem. I've found that reducing the % of cpus used in BAM preferences will reduce the number of rosetta tasks that run concurrently, from 5 at 90% to 2 at 40%. Unfortunately, that does not solve the GPU problem either. All 3 GPUs run Collatz tasks with rosetta suspended. When I click resume for rosetta, Collatz processing pauses momentarily in BAM display, then 2 of my GPUs go into wait mode and multiple rosetta tasks start.. I also set resource preference on rosetta page back to 100, since trying to limit rosetta resources that way doesn't work. I suppose I might find a clue in the backup files of one of my PCs before BAM and BOINC were reinstalled recently. Last week one PC was actually running 3 collatz GPU tasks and 3 rosetta tascks at the same time before I cleaned out old files to clear the error for 2 connections to rosetta. But when the rosetta tasks completed, rosetta went into pause and BAM told me the pause was at my request, though it wasn't? So clearly something was not right. Gonna keep trying to resolve this somehow. Thanks to the folks that gave me feedback on this. It may be time for you to add Ralph@home to one of your computers with 3 GPUs each to help debug this problem. |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Eureka! There's gold in the BOINC Client config parameter doc page. Problem solved! Again, thanks to all the folks who assisted me in digging this out! Got it! Following info. Shotgun approach that works. All parameters are documented at boinc site. All 3 GPUs run collatz while 3 rosetta tasks run! This works on my 6 core I7 and 8 core AMD, both eunning Win 10. URL for boinc configuration parameters: https://boinc.berkeley.edu/wiki/Client_configuration ** cc_config.xml in C:windowsprogramdataboinc Use all GPUs for collatz. Exclude NVIDIA GPUs for rosetta. I run NVIDIA only. Also arguments for other GPU types and GPU by number like 2 for second one? cc_config.xml ------------------------------------ <cc_config> <options> <use_all_gpus>1</use_all_gpus> <skip_cpu_benchmarks>1</skip_cpu_benchmarks> <exclude_gpu> <url>https://boinc.bakerlab.org/rosetta/</url> <type>NVIDIA</type> </exclude_gpu> </options> </cc_config> ** This app_config in C:windowsprogramdataboincproject...collatz... directory. Says to use a GPU for each collatz_sieve task. I personally don't want to try fractional values for <gpu_usage>, though you may be able too with 6GB or 8GB GPUs. Supposed to allow multiple tasks per card that way? app_ config.xml in collatz project directory --------------------- <app_config> <app> <name>collatz_sieve</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.<span class="mark">3</span></cpu_usage> </gpu_versions> </app> </app_config> ** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here. app_ config.xml in rosetta project directory --------------------- <app_config> <app> <name>rosetta</name> </app> <project_max_concurrent>3</project_max_concurrent> </app_config> Crunch those numbers! |
RandyE Send message Joined: 22 Sep 10 Posts: 4 Credit: 2,973,365 RAC: 0 |
Hmmm, guess the editor here don't like the "" character in pathname strings. Oh well, we can figure it out, right? |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
It seems to be a bug in the BOINC forum software, as backslashes in user input get interpreted as escape characters instead of themselves being escaped… … multiple times… … which means if you type enough in the input to overcome that, you’ll get one in the output! Four backslashes in ⟶ one backslash out C:\Program Files\BOINC |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
** This app_config in C:windowsprogramdataboincproject...rosetta... directory. Says limit rosetta to N tasks, 3 here.Not necessary if you reserve a CPU core to support the GPU. If you run out of GPU work, the CPU cores will pick up CPU work. If you get more GPU work, then those cores will go back to supporting the GPU. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Ah, we're back. Project was MIA for a while- no web site & uploads backing up. Edit- although still no luck with Scheduler responses, says it's down for maintenance. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
Edit- although still no luck with Scheduler responses, says it's down for maintenance.Still down. Grant Darwin NT |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
app_ config.xml in rosetta project directory --------------------- You don't need the app tags if you are using a project_max_concurrent. It applies to the project as whole, not to a particular app. You can simplify it to: <app_config> <project_max_concurrent>3</project_max_concurrent> </app_config> BOINC blog |
tom Send message Joined: 29 Nov 08 Posts: 10 Credit: 6,044,733 RAC: 0 |
for some reason, i have been set to ONE work unit a day for quite a while now. after literally years of processing lots of work units, trouble-free, i still don't understand why. afaik, it started when boinc switched to ssl, but since i can successfully connect to other sites over ssl, the switch over (yes, i switched, too) shouldn't have nuked my ability to communicate with the project. and no, i don't see any errors in the event log, although i'm not very expert at looking through it. currently running: boinc 7.16.11 mac os x 10.7.5 mac mini server i7 |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Deleted. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,382,444 RAC: 19,446 |
for some reason, i have been set to ONE work unit a day for quite a while now.Because all you do is produce errors. If you want more than 1 Task per day, you need to start producing Valid work. Try detaching & re-attaching to the project- that will dump all your current work, but it will make the system re-download the science application. If you are still producing errors- then it will most likely be a hardware issue- memory, power supply, CPU overheating (memory or PSU overheating) (or possibly an OS issue, but very, very unlikely- unless you recently did an update of some sort?) Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
This is the same problem you reported five months ago? The server is limiting the amount of work it sends because your computer is returning so many errors. It can’t be related to SSL, since BOINC is successfully communicating with the server and able to download tasks. The other thing that changed around the same time was the update to application version 4.20. Your recent tasks have all failed within seconds of starting, which suggests there’s some kind of fundamental incompatibility between the application and your system. Any Mac OS experts here who can offer any suggestions how to diagnose that? You could try the Mac forum, but it’s pretty quiet in there… |
nikolce Send message Joined: 28 Apr 07 Posts: 2 Credit: 2,002,356 RAC: 0 |
Hi all, Can someone tell me if I should abort the below tasks? It's a bit annoying to find your CPU crunching nothing for two days. Thank you! |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org