Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 309 · Next
Author | Message |
---|---|
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
Thank you for replying. More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's. See here https://imgur.com/a/3iBM4DO I have this on all Gen9, while I have Gen8 with 64 logical CPU's which are all fully used. I am now deploying another Gen9 and will see what that gives. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,380,064 RAC: 20,136 |
What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.Hmm. I had something similar with my GPUs on Seti where the driver install went very weird & it showed double the number of actual GPUs in the BOINC log. I would check the Event log and make sure there is only 1 CPU entry in there (although being a muti-socket system it should probably be 2, making sure there aren't 4 in there). eg- 30/03/2020 15:09:34 | | CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3556MB available, 14054 GFLOPS peak) 30/03/2020 15:09:34 | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak) 30/03/2020 15:09:34 | | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, device version OpenCL 1.2 CUDA, 6144MB, 3556MB available, 14054 GFLOPS peak) 30/03/2020 15:09:34 | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak) 30/03/2020 15:09:34 | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [Family 6 Model 158 Stepping 10] 30/03/2020 15:09:34 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2 30/03/2020 15:09:34 | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00) 30/03/2020 15:09:34 | | Memory: 31.95 GB physical, 36.70 GB virtual 30/03/2020 15:09:34 | | Disk: 930.50 GB total, 823.00 GB free 30/03/2020 15:09:34 | | Local time is UTC +9 hours 30/03/2020 15:09:34 | SETI@home | Found app_config.xml 30/03/2020 15:09:34 | SETI@home Beta Test | Found app_config.xml When my driver issue occurred, the CUDA & OpenCL entries for each video card were doubled up- resulting in 2 Tasks running on only the 1 GPU. Grant Darwin NT |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.Do you mean 16 cores/32 threads? What version of BOINC are you running, is it up to date (7.14 or later)? Do you have local prefs limiting it to 50% of CPU, if so change it to 100% What percentage of memory is it allowed to use (it has one setting for in use and another for idle). If you are logged in then it’s in use as far as BOINC is concerned. Do you have an app_config in the Rosetta project folder limiting the number of tasks? |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
I checked the event log. I don't see anything special in there.... 0-Mar-2020 10:14:03 [---] Starting BOINC client version 7.14.2 for windows_x86_64 30-Mar-2020 10:14:03 [---] log flags: file_xfer, sched_ops, task 30-Mar-2020 10:14:03 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8 30-Mar-2020 10:14:03 [---] Running as a daemon (GPU computing disabled) 30-Mar-2020 10:14:03 [---] Data directory: C:ProgramDataBOINC 30-Mar-2020 10:14:03 [---] Running under account boinc_master 30-Mar-2020 10:14:03 [---] No usable GPUs found 30-Mar-2020 10:14:03 [---] Creating new client state file 30-Mar-2020 10:14:03 [---] Processor: 32 GenuineIntel Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2] 30-Mar-2020 10:14:03 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 dca pbe fsgsbase bmi1 smep bmi2 30-Mar-2020 10:14:03 [---] OS: Microsoft Windows Server 2016: Standard x64 Edition, (10.00.14393.00) 30-Mar-2020 10:14:03 [---] Memory: 383.87 GB physical, 423.87 GB virtual 30-Mar-2020 10:14:03 [---] Disk: 1023.45 GB total, 966.99 GB free 30-Mar-2020 10:14:03 [---] Local time is UTC +2 hours 30-Mar-2020 10:14:03 [---] No WSL found. 30-Mar-2020 10:14:03 [---] Last benchmark was 18351 days 08:14:03 ago 30-Mar-2020 10:14:08 [---] No general preferences found - using defaults 30-Mar-2020 10:14:08 [---] Preferences: 30-Mar-2020 10:14:08 [---] max memory usage when active: 196543.06 MB 30-Mar-2020 10:14:08 [---] max memory usage when idle: 353777.50 MB 30-Mar-2020 10:14:08 [---] max disk usage: 921.10 GB 30-Mar-2020 10:14:08 [---] don't use GPU while active 30-Mar-2020 10:14:08 [---] suspend work if non-BOINC CPU load exceeds 25% 30-Mar-2020 10:14:08 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 30-Mar-2020 10:14:08 [---] Setting up project and slot directories 30-Mar-2020 10:14:08 [---] Checking active tasks 30-Mar-2020 10:14:08 [---] Setting up GUI RPC socket 30-Mar-2020 10:14:08 [---] Checking presence of 0 project files 30-Mar-2020 10:14:08 [---] This computer is not attached to any projects 30-Mar-2020 10:43:45 [---] Using proxy info from GUI 30-Mar-2020 10:44:21 [---] Fetching configuration file from https://boinc.bakerlab.org/rosetta/get_project_config.php 30-Mar-2020 10:44:39 [---] Running CPU benchmarks 30-Mar-2020 10:44:39 [---] Suspending computation - CPU benchmarks in progress 30-Mar-2020 10:45:10 [---] Benchmark results: 30-Mar-2020 10:45:10 [---] Number of CPUs: 32 30-Mar-2020 10:45:10 [---] 2933 floating point MIPS (Whetstone) per CPU 30-Mar-2020 10:45:10 [---] 11378 integer MIPS (Dhrystone) per CPU |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
Latest BOINC. Fresh install from today. Global settings in boinc profile is: use at most 100% of the cpus use at most 100% cpu time For memory, use at most 90%, but as you can see in the screenshot I attached, there is more then enough free. I have no app_config in the ProgramDataBOINC folder. |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
Latest BOINC. Fresh install from today. What about that suspend when non-BOINC load > 25%? Can you try setting it to zero. Are computing options set to “run always”? |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM "Activity" in BAM is all set to "Always"... I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half... |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,380,064 RAC: 20,136 |
Latest BOINC. Fresh install from today.Those were my next couple of questions, because the startup messages there look good, and some settings in app_config.xml will result in more cores than physically exist. But the number of Tasks you have matches the number of threads available, yet they are doubled up on physical cores. Are all of the Tasks running on just the 1 CPU? Wild speculation- configuration setting on the OS (boot config/environment variables etc?) is blocking the use of 1 CPU, but since the OS is reporting all Cores & Threads, that's how many Tasks are running even though half of them aren't actually available for use??? Got me scratching my head, hopefully someone else will have come across it before. Anyway- Good luck, it's past my bed time. Grant Darwin NT |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM Have you shut it down/rebooted after BOINC install? |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
I installed as a service, so it needs a reboot after install... |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 399 Credit: 12,294,748 RAC: 6,222 |
"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM Silly question, could hyperthreading be turned off in the bios? |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM There are no silly questions. But HT is enabled. Here are the other BIOS options about performance (last word is the current setting): Intel(R) Turbo Boost Technology Default - Enabled Enabled ACPI SLIT Default - Enabled Enabled Node Interleaving Default - Disabled Disabled Intel NIC DMA Channels (IOAT) Default - Enabled Enabled HW Prefetcher Default - Enabled Enabled Adjacent Sector Prefetch Default - Enabled Enabled DCU Stream Prefetcher Default - Enabled Enabled DCU IP Prefetcher Default - Enabled Enabled QPI Snoop Configuration Default - Home Snoop Home Snoop QPI Home Snoop Optimization Default - Directory + OSB Enabled QPI Bandwidth Optimization (RTID) Default - Balanced Balanced Memory Proximity Reporting for I/O Default - Enabled Enabled I/O Non-posted Prefetching Default - Enabled Enabled NUMA Group Size Optimization Default - Clustered Clustered Intel Performance Monitoring Support Default - Disabled Disabled |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
I installed as a service, so it needs a reboot after install... Can you create a cc_config.xml file, save it in the BOINC Data Directory (Usually C:/ProgramData/BOINC, you can check the event log for the correct path) with this, changing "N" to the numbers of Threads you want to run: <cc_config> <options> <ncpus>N</ncpus> </options> </cc_config> I remember someone at the WCG forums with a 32C/64T AMD CPU that was running only 32 tasks. Once you save the file, go to BOINC-Options-Read Config Files or something like that. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
What version of BOINC are you using? I've read that versions have a limit on how many logical cores they are able to use. I think it was either 32 or 64 for the 7.14.2 version I use. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
Getting this again - Just a pre-emptive post 30/03/2020 14:28:05 | Rosetta@home | update requested by user Task creation not meeting the very high demand again? |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Getting this again - Just a pre-emptive post "Total queued jobs: 17,707" That was updated several hours ago and has gone down since Friday. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Getting this again - Just a pre-emptive post Check further down the server status page. The lower part appears to say that only 5 tasks are now available. I expect them to be gone quickly. |
HPE Belgium Send message Joined: 27 Mar 20 Posts: 16 Credit: 367,648,439 RAC: 0 |
I installed as a service, so it needs a reboot after install... Doesnt make any difference I have other servers that are running fine on 80 cores with load on all 80.... So it's not that BOINC does not support it. I really don't know why it's happening here. |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Weird. It should have worked even if all you had was a 1-Core CPU... |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Yeah, workunits aren't being generated, I think. The queue was well over 1 million just yesterday. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org