Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 309 · Next

AuthorMessage
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92593 - Posted: 30 Mar 2020, 9:53:56 UTC - in response to Message 92591.  
Last modified: 30 Mar 2020, 9:54:45 UTC

Thank you for replying.

More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.

See here https://imgur.com/a/3iBM4DO

I have this on all Gen9, while I have Gen8 with 64 logical CPU's which are all fully used.

I am now deploying another Gen9 and will see what that gives.
ID: 92593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,378,164
RAC: 20,578
Message 92594 - Posted: 30 Mar 2020, 9:56:37 UTC - in response to Message 92593.  
Last modified: 30 Mar 2020, 10:06:26 UTC

What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.
See here
Hmm.
I had something similar with my GPUs on Seti where the driver install went very weird & it showed double the number of actual GPUs in the BOINC log.

I would check the Event log and make sure there is only 1 CPU entry in there (although being a muti-socket system it should probably be 2, making sure there aren't 4 in there).
eg-

30/03/2020 15:09:34 |  | CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3556MB available, 14054 GFLOPS peak)
30/03/2020 15:09:34 |  | CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak)
30/03/2020 15:09:34 |  | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, device version OpenCL 1.2 CUDA, 6144MB, 3556MB available, 14054 GFLOPS peak)
30/03/2020 15:09:34 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak)
30/03/2020 15:09:34 |  | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [Family 6 Model 158 Stepping 10]
30/03/2020 15:09:34 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
30/03/2020 15:09:34 |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00)
30/03/2020 15:09:34 |  | Memory: 31.95 GB physical, 36.70 GB virtual
30/03/2020 15:09:34 |  | Disk: 930.50 GB total, 823.00 GB free
30/03/2020 15:09:34 |  | Local time is UTC +9 hours
30/03/2020 15:09:34 | SETI@home | Found app_config.xml
30/03/2020 15:09:34 | SETI@home Beta Test | Found app_config.xml

When my driver issue occurred, the CUDA & OpenCL entries for each video card were doubled up- resulting in 2 Tasks running on only the 1 GPU.
Grant
Darwin NT
ID: 92594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 92595 - Posted: 30 Mar 2020, 10:13:26 UTC - in response to Message 92593.  
Last modified: 30 Mar 2020, 10:24:25 UTC

More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.
Do you mean 16 cores/32 threads?

What version of BOINC are you running, is it up to date (7.14 or later)?

Do you have local prefs limiting it to 50% of CPU, if so change it to 100%

What percentage of memory is it allowed to use (it has one setting for in use and another for idle). If you are logged in then it’s in use as far as BOINC is concerned.

Do you have an app_config in the Rosetta project folder limiting the number of tasks?
ID: 92595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92597 - Posted: 30 Mar 2020, 10:17:23 UTC - in response to Message 92594.  

I checked the event log. I don't see anything special in there....

0-Mar-2020 10:14:03 [---] Starting BOINC client version 7.14.2 for windows_x86_64
30-Mar-2020 10:14:03 [---] log flags: file_xfer, sched_ops, task
30-Mar-2020 10:14:03 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
30-Mar-2020 10:14:03 [---] Running as a daemon (GPU computing disabled)
30-Mar-2020 10:14:03 [---] Data directory: C:ProgramDataBOINC
30-Mar-2020 10:14:03 [---] Running under account boinc_master
30-Mar-2020 10:14:03 [---] No usable GPUs found
30-Mar-2020 10:14:03 [---] Creating new client state file
30-Mar-2020 10:14:03 [---] Processor: 32 GenuineIntel Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
30-Mar-2020 10:14:03 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 dca pbe fsgsbase bmi1 smep bmi2
30-Mar-2020 10:14:03 [---] OS: Microsoft Windows Server 2016: Standard x64 Edition, (10.00.14393.00)
30-Mar-2020 10:14:03 [---] Memory: 383.87 GB physical, 423.87 GB virtual
30-Mar-2020 10:14:03 [---] Disk: 1023.45 GB total, 966.99 GB free
30-Mar-2020 10:14:03 [---] Local time is UTC +2 hours
30-Mar-2020 10:14:03 [---] No WSL found.
30-Mar-2020 10:14:03 [---] Last benchmark was 18351 days 08:14:03 ago
30-Mar-2020 10:14:08 [---] No general preferences found - using defaults
30-Mar-2020 10:14:08 [---] Preferences:
30-Mar-2020 10:14:08 [---]    max memory usage when active: 196543.06 MB
30-Mar-2020 10:14:08 [---]    max memory usage when idle: 353777.50 MB
30-Mar-2020 10:14:08 [---]    max disk usage: 921.10 GB
30-Mar-2020 10:14:08 [---]    don't use GPU while active
30-Mar-2020 10:14:08 [---]    suspend work if non-BOINC CPU load exceeds 25%
30-Mar-2020 10:14:08 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
30-Mar-2020 10:14:08 [---] Setting up project and slot directories
30-Mar-2020 10:14:08 [---] Checking active tasks
30-Mar-2020 10:14:08 [---] Setting up GUI RPC socket
30-Mar-2020 10:14:08 [---] Checking presence of 0 project files
30-Mar-2020 10:14:08 [---] This computer is not attached to any projects
30-Mar-2020 10:43:45 [---] Using proxy info from GUI
30-Mar-2020 10:44:21 [---] Fetching configuration file from https://boinc.bakerlab.org/rosetta/get_project_config.php
30-Mar-2020 10:44:39 [---] Running CPU benchmarks
30-Mar-2020 10:44:39 [---] Suspending computation - CPU benchmarks in progress
30-Mar-2020 10:45:10 [---] Benchmark results:
30-Mar-2020 10:45:10 [---]    Number of CPUs: 32
30-Mar-2020 10:45:10 [---]    2933 floating point MIPS (Whetstone) per CPU
30-Mar-2020 10:45:10 [---]    11378 integer MIPS (Dhrystone) per CPU
ID: 92597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92598 - Posted: 30 Mar 2020, 10:22:23 UTC - in response to Message 92595.  

Latest BOINC. Fresh install from today.
Global settings in boinc profile is:
use at most 100% of the cpus
use at most 100% cpu time

For memory, use at most 90%, but as you can see in the screenshot I attached, there is more then enough free.

I have no app_config in the ProgramDataBOINC folder.
ID: 92598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 92599 - Posted: 30 Mar 2020, 10:29:02 UTC - in response to Message 92598.  
Last modified: 30 Mar 2020, 10:32:32 UTC

Latest BOINC. Fresh install from today.
Global settings in boinc profile is:
use at most 100% of the cpus
use at most 100% cpu time

For memory, use at most 90%, but as you can see in the screenshot I attached, there is more then enough free.

I have no app_config in the ProgramDataBOINC folder.

What about that suspend when non-BOINC load > 25%? Can you try setting it to zero. Are computing options set to “run always”?
ID: 92599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92600 - Posted: 30 Mar 2020, 10:36:28 UTC - in response to Message 92599.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...
ID: 92600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,378,164
RAC: 20,578
Message 92601 - Posted: 30 Mar 2020, 10:37:48 UTC - in response to Message 92598.  

Latest BOINC. Fresh install from today.

I have no app_config in the ProgramDataBOINC folder.
Those were my next couple of questions, because the startup messages there look good, and some settings in app_config.xml will result in more cores than physically exist.
But the number of Tasks you have matches the number of threads available, yet they are doubled up on physical cores.

Are all of the Tasks running on just the 1 CPU?
Wild speculation- configuration setting on the OS (boot config/environment variables etc?) is blocking the use of 1 CPU, but since the OS is reporting all Cores & Threads, that's how many Tasks are running even though half of them aren't actually available for use???

Got me scratching my head, hopefully someone else will have come across it before.

Anyway- Good luck, it's past my bed time.
Grant
Darwin NT
ID: 92601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 92602 - Posted: 30 Mar 2020, 10:41:07 UTC - in response to Message 92600.  
Last modified: 30 Mar 2020, 10:41:34 UTC

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...

Have you shut it down/rebooted after BOINC install?
ID: 92602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92603 - Posted: 30 Mar 2020, 10:46:53 UTC - in response to Message 92602.  

I installed as a service, so it needs a reboot after install...
ID: 92603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 92604 - Posted: 30 Mar 2020, 10:47:55 UTC - in response to Message 92600.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...


Silly question, could hyperthreading be turned off in the bios?
ID: 92604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92606 - Posted: 30 Mar 2020, 11:01:01 UTC - in response to Message 92604.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...


Silly question, could hyperthreading be turned off in the bios?


There are no silly questions. But HT is enabled. Here are the other BIOS options about performance (last word is the current setting):
Intel(R) Turbo Boost Technology Default - Enabled Enabled
ACPI SLIT Default - Enabled Enabled

Node Interleaving Default - Disabled Disabled
Intel NIC DMA Channels (IOAT) Default - Enabled Enabled
HW Prefetcher Default - Enabled Enabled
Adjacent Sector Prefetch Default - Enabled Enabled
DCU Stream Prefetcher Default - Enabled Enabled
DCU IP Prefetcher Default - Enabled Enabled
QPI Snoop Configuration Default - Home Snoop Home Snoop
QPI Home Snoop Optimization Default - Directory + OSB Enabled
QPI Bandwidth Optimization (RTID) Default - Balanced Balanced
Memory Proximity Reporting for I/O Default - Enabled Enabled
I/O Non-posted Prefetching Default - Enabled Enabled
NUMA Group Size Optimization Default - Clustered Clustered
Intel Performance Monitoring Support Default - Disabled Disabled
ID: 92606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92615 - Posted: 30 Mar 2020, 12:23:41 UTC - in response to Message 92603.  
Last modified: 30 Mar 2020, 12:26:58 UTC

I installed as a service, so it needs a reboot after install...



Can you create a cc_config.xml file, save it in the BOINC Data Directory (Usually C:/ProgramData/BOINC, you can check the event log for the correct path) with this, changing "N" to the numbers of Threads you want to run:


<cc_config>
<options>
<ncpus>N</ncpus>
</options>
</cc_config>


I remember someone at the WCG forums with a 32C/64T AMD CPU that was running only 32 tasks.

Once you save the file, go to BOINC-Options-Read Config Files or something like that.
ID: 92615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 92616 - Posted: 30 Mar 2020, 12:44:22 UTC

What version of BOINC are you using? I've read that versions have a limit on how many logical cores they are able to use.

I think it was either 32 or 64 for the 7.14.2 version I use.
ID: 92616 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 92620 - Posted: 30 Mar 2020, 13:31:28 UTC

Getting this again - Just a pre-emptive post
30/03/2020 14:28:05 | Rosetta@home | update requested by user
30/03/2020 14:28:09 | Rosetta@home | Sending scheduler request: Requested by user.
30/03/2020 14:28:09 | Rosetta@home | Requesting new tasks for CPU
30/03/2020 14:28:12 | Rosetta@home | Scheduler request completed: got 0 new tasks
30/03/2020 14:28:12 | Rosetta@home | No tasks sent

Task creation not meeting the very high demand again?
ID: 92620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92624 - Posted: 30 Mar 2020, 13:53:33 UTC - in response to Message 92620.  
Last modified: 30 Mar 2020, 13:54:16 UTC

Getting this again - Just a pre-emptive post
30/03/2020 14:28:05 | Rosetta@home | update requested by user
30/03/2020 14:28:09 | Rosetta@home | Sending scheduler request: Requested by user.
30/03/2020 14:28:09 | Rosetta@home | Requesting new tasks for CPU
30/03/2020 14:28:12 | Rosetta@home | Scheduler request completed: got 0 new tasks
30/03/2020 14:28:12 | Rosetta@home | No tasks sent

Task creation not meeting the very high demand again?



"Total queued jobs: 17,707" That was updated several hours ago and has gone down since Friday.
ID: 92624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 92626 - Posted: 30 Mar 2020, 14:26:16 UTC - in response to Message 92624.  

Getting this again - Just a pre-emptive post
30/03/2020 14:28:05 | Rosetta@home | update requested by user
30/03/2020 14:28:09 | Rosetta@home | Sending scheduler request: Requested by user.
30/03/2020 14:28:09 | Rosetta@home | Requesting new tasks for CPU
30/03/2020 14:28:12 | Rosetta@home | Scheduler request completed: got 0 new tasks
30/03/2020 14:28:12 | Rosetta@home | No tasks sent

Task creation not meeting the very high demand again?



"Total queued jobs: 17,707" That was updated several hours ago and has gone down since Friday.


Check further down the server status page. The lower part appears to say that only 5 tasks are now available. I expect them to be gone quickly.
ID: 92626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 367,648,439
RAC: 0
Message 92627 - Posted: 30 Mar 2020, 14:31:29 UTC - in response to Message 92615.  

I installed as a service, so it needs a reboot after install...



Can you create a cc_config.xml file, save it in the BOINC Data Directory (Usually C:/ProgramData/BOINC, you can check the event log for the correct path) with this, changing "N" to the numbers of Threads you want to run:


<cc_config>
<options>
<ncpus>N</ncpus>
</options>
</cc_config>


I remember someone at the WCG forums with a 32C/64T AMD CPU that was running only 32 tasks.

Once you save the file, go to BOINC-Options-Read Config Files or something like that.


Doesnt make any difference I have other servers that are running fine on 80 cores with load on all 80.... So it's not that BOINC does not support it. I really don't know why it's happening here.
ID: 92627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92628 - Posted: 30 Mar 2020, 14:32:16 UTC - in response to Message 92627.  

Weird. It should have worked even if all you had was a 1-Core CPU...
ID: 92628 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 92629 - Posted: 30 Mar 2020, 14:34:04 UTC - in response to Message 92626.  
Last modified: 30 Mar 2020, 14:35:48 UTC

Yeah, workunits aren't being generated, I think.
The queue was well over 1 million just yesterday.
ID: 92629 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org