Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 272 · Next

AuthorMessage
Profile bormolino

Send message
Joined: 16 May 13
Posts: 4
Credit: 160,977
RAC: 0
Message 92558 - Posted: 29 Mar 2020, 18:34:00 UTC - in response to Message 92534.  

PS- Just after posting, I now see that bormolino might be reporting the same issue just above my post.


Yes :D

Same on my machines.
ID: 92558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 92572 - Posted: 30 Mar 2020, 0:17:14 UTC - in response to Message 92558.  

Follow-up to my earlier post: At the most recent screensaver invocation, the normal behavior resumed.
Note: Though subscribed to this thread, I received no notification of bormolino's post.
Eric
ID: 92572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rlpm

Send message
Joined: 23 Mar 20
Posts: 13
Credit: 84
RAC: 0
Message 92573 - Posted: 30 Mar 2020, 0:23:19 UTC - in response to Message 92572.  

Note: Though subscribed to this thread, I received no notification of bormolino's post.


Check your community prefs from your main account page.
ID: 92573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,909,879
RAC: 9,621
Message 92581 - Posted: 30 Mar 2020, 2:21:05 UTC

Not sure what this means atm
30/03/2020 3:17:00 | Rosetta@home | Scheduler request completed: got 0 new tasks
30/03/2020 3:17:00 | Rosetta@home | Server can't open database

Also, entering this thread I initially got a message saying the site was down. Came back on a refresh
ID: 92581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 92582 - Posted: 30 Mar 2020, 2:43:32 UTC

getting an 'temporarily failed upload of (w/u name here xxx ) transient http error' message on upload failure and time out.

I'm guessing it's just some new message I've never seen and the project is just getting updated, etc.
ID: 92582 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92589 - Posted: 30 Mar 2020, 9:22:12 UTC

Hello,

I have some servers that I want to use for R@H. Most of the servers use full CPU and all cores/logical CPU's, however I have 2 servers that only use half of the available logical processor.
Both servers are ProLiant Gen9 servers.

One server is a BL660c Gen9 with 32 logical CPU's but only half of them are working while I still have tasks "ready to start".
Other server is DL380 Gen9 which takes 67% CPU load instad of 100%
My other servers are Gen8 servers which take full load.


Is there something I can do to fix this? Somebody that can help me troubleshoot? All my preferences are set to 100% load in my global preferences and this setting works fine on most of my servers.
ID: 92589 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1437
Credit: 13,693,695
RAC: 0
Message 92591 - Posted: 30 Mar 2020, 9:37:39 UTC - in response to Message 92589.  

Is there something I can do to fix this? Somebody that can help me troubleshoot? All my preferences are set to 100% load in my global preferences and this setting works fine on most of my servers.
Are they "Ready to start" or "Waiting on memory?"- they've got enough RAM to support all of those cores & threads? You haven't changed any settings in the BOINC Manager on those systems (local settings override web based ones)?
Grant
Darwin NT
ID: 92591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92593 - Posted: 30 Mar 2020, 9:53:56 UTC - in response to Message 92591.  
Last modified: 30 Mar 2020, 9:54:45 UTC

Thank you for replying.

More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.

See here https://imgur.com/a/3iBM4DO

I have this on all Gen9, while I have Gen8 with 64 logical CPU's which are all fully used.

I am now deploying another Gen9 and will see what that gives.
ID: 92593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1437
Credit: 13,693,695
RAC: 0
Message 92594 - Posted: 30 Mar 2020, 9:56:37 UTC - in response to Message 92593.  
Last modified: 30 Mar 2020, 10:06:26 UTC

What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.
See here
Hmm.
I had something similar with my GPUs on Seti where the driver install went very weird & it showed double the number of actual GPUs in the BOINC log.

I would check the Event log and make sure there is only 1 CPU entry in there (although being a muti-socket system it should probably be 2, making sure there aren't 4 in there).
eg-

30/03/2020 15:09:34 |  | CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3556MB available, 14054 GFLOPS peak)
30/03/2020 15:09:34 |  | CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, CUDA version 10.2, compute capability 6.1, 4096MB, 3556MB available, 6852 GFLOPS peak)
30/03/2020 15:09:34 |  | OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 442.59, device version OpenCL 1.2 CUDA, 6144MB, 3556MB available, 14054 GFLOPS peak)
30/03/2020 15:09:34 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 442.59, device version OpenCL 1.2 CUDA, 8192MB, 3556MB available, 6852 GFLOPS peak)
30/03/2020 15:09:34 |  | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [Family 6 Model 158 Stepping 10]
30/03/2020 15:09:34 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
30/03/2020 15:09:34 |  | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00)
30/03/2020 15:09:34 |  | Memory: 31.95 GB physical, 36.70 GB virtual
30/03/2020 15:09:34 |  | Disk: 930.50 GB total, 823.00 GB free
30/03/2020 15:09:34 |  | Local time is UTC +9 hours
30/03/2020 15:09:34 | SETI@home | Found app_config.xml
30/03/2020 15:09:34 | SETI@home Beta Test | Found app_config.xml

When my driver issue occurred, the CUDA & OpenCL entries for each video card were doubled up- resulting in 2 Tasks running on only the 1 GPU.
Grant
Darwin NT
ID: 92594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 24,961,617
RAC: 2,110
Message 92595 - Posted: 30 Mar 2020, 10:13:26 UTC - in response to Message 92593.  
Last modified: 30 Mar 2020, 10:24:25 UTC

More then enough free memory, and they are really "Ready to Start". What I do see however that I have 32 jobs running with a total of 32 logical CPU's in my server, but it is only using half of the Logical CPU's.
Do you mean 16 cores/32 threads?

What version of BOINC are you running, is it up to date (7.14 or later)?

Do you have local prefs limiting it to 50% of CPU, if so change it to 100%

What percentage of memory is it allowed to use (it has one setting for in use and another for idle). If you are logged in then it’s in use as far as BOINC is concerned.

Do you have an app_config in the Rosetta project folder limiting the number of tasks?
ID: 92595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92597 - Posted: 30 Mar 2020, 10:17:23 UTC - in response to Message 92594.  

I checked the event log. I don't see anything special in there....

0-Mar-2020 10:14:03 [---] Starting BOINC client version 7.14.2 for windows_x86_64
30-Mar-2020 10:14:03 [---] log flags: file_xfer, sched_ops, task
30-Mar-2020 10:14:03 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
30-Mar-2020 10:14:03 [---] Running as a daemon (GPU computing disabled)
30-Mar-2020 10:14:03 [---] Data directory: C:ProgramDataBOINC
30-Mar-2020 10:14:03 [---] Running under account boinc_master
30-Mar-2020 10:14:03 [---] No usable GPUs found
30-Mar-2020 10:14:03 [---] Creating new client state file
30-Mar-2020 10:14:03 [---] Processor: 32 GenuineIntel Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz [Family 6 Model 63 Stepping 2]
30-Mar-2020 10:14:03 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 dca pbe fsgsbase bmi1 smep bmi2
30-Mar-2020 10:14:03 [---] OS: Microsoft Windows Server 2016: Standard x64 Edition, (10.00.14393.00)
30-Mar-2020 10:14:03 [---] Memory: 383.87 GB physical, 423.87 GB virtual
30-Mar-2020 10:14:03 [---] Disk: 1023.45 GB total, 966.99 GB free
30-Mar-2020 10:14:03 [---] Local time is UTC +2 hours
30-Mar-2020 10:14:03 [---] No WSL found.
30-Mar-2020 10:14:03 [---] Last benchmark was 18351 days 08:14:03 ago
30-Mar-2020 10:14:08 [---] No general preferences found - using defaults
30-Mar-2020 10:14:08 [---] Preferences:
30-Mar-2020 10:14:08 [---]    max memory usage when active: 196543.06 MB
30-Mar-2020 10:14:08 [---]    max memory usage when idle: 353777.50 MB
30-Mar-2020 10:14:08 [---]    max disk usage: 921.10 GB
30-Mar-2020 10:14:08 [---]    don't use GPU while active
30-Mar-2020 10:14:08 [---]    suspend work if non-BOINC CPU load exceeds 25%
30-Mar-2020 10:14:08 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
30-Mar-2020 10:14:08 [---] Setting up project and slot directories
30-Mar-2020 10:14:08 [---] Checking active tasks
30-Mar-2020 10:14:08 [---] Setting up GUI RPC socket
30-Mar-2020 10:14:08 [---] Checking presence of 0 project files
30-Mar-2020 10:14:08 [---] This computer is not attached to any projects
30-Mar-2020 10:43:45 [---] Using proxy info from GUI
30-Mar-2020 10:44:21 [---] Fetching configuration file from https://boinc.bakerlab.org/rosetta/get_project_config.php
30-Mar-2020 10:44:39 [---] Running CPU benchmarks
30-Mar-2020 10:44:39 [---] Suspending computation - CPU benchmarks in progress
30-Mar-2020 10:45:10 [---] Benchmark results:
30-Mar-2020 10:45:10 [---]    Number of CPUs: 32
30-Mar-2020 10:45:10 [---]    2933 floating point MIPS (Whetstone) per CPU
30-Mar-2020 10:45:10 [---]    11378 integer MIPS (Dhrystone) per CPU
ID: 92597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92598 - Posted: 30 Mar 2020, 10:22:23 UTC - in response to Message 92595.  

Latest BOINC. Fresh install from today.
Global settings in boinc profile is:
use at most 100% of the cpus
use at most 100% cpu time

For memory, use at most 90%, but as you can see in the screenshot I attached, there is more then enough free.

I have no app_config in the ProgramDataBOINC folder.
ID: 92598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 24,961,617
RAC: 2,110
Message 92599 - Posted: 30 Mar 2020, 10:29:02 UTC - in response to Message 92598.  
Last modified: 30 Mar 2020, 10:32:32 UTC

Latest BOINC. Fresh install from today.
Global settings in boinc profile is:
use at most 100% of the cpus
use at most 100% cpu time

For memory, use at most 90%, but as you can see in the screenshot I attached, there is more then enough free.

I have no app_config in the ProgramDataBOINC folder.

What about that suspend when non-BOINC load > 25%? Can you try setting it to zero. Are computing options set to “run always”?
ID: 92599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92600 - Posted: 30 Mar 2020, 10:36:28 UTC - in response to Message 92599.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...
ID: 92600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1437
Credit: 13,693,695
RAC: 0
Message 92601 - Posted: 30 Mar 2020, 10:37:48 UTC - in response to Message 92598.  

Latest BOINC. Fresh install from today.

I have no app_config in the ProgramDataBOINC folder.
Those were my next couple of questions, because the startup messages there look good, and some settings in app_config.xml will result in more cores than physically exist.
But the number of Tasks you have matches the number of threads available, yet they are doubled up on physical cores.

Are all of the Tasks running on just the 1 CPU?
Wild speculation- configuration setting on the OS (boot config/environment variables etc?) is blocking the use of 1 CPU, but since the OS is reporting all Cores & Threads, that's how many Tasks are running even though half of them aren't actually available for use???

Got me scratching my head, hopefully someone else will have come across it before.

Anyway- Good luck, it's past my bed time.
Grant
Darwin NT
ID: 92601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 24,961,617
RAC: 2,110
Message 92602 - Posted: 30 Mar 2020, 10:41:07 UTC - in response to Message 92600.  
Last modified: 30 Mar 2020, 10:41:34 UTC

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...

Have you shut it down/rebooted after BOINC install?
ID: 92602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92603 - Posted: 30 Mar 2020, 10:46:53 UTC - in response to Message 92602.  

I installed as a service, so it needs a reboot after install...
ID: 92603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 359
Credit: 10,189,858
RAC: 4,683
Message 92604 - Posted: 30 Mar 2020, 10:47:55 UTC - in response to Message 92600.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...


Silly question, could hyperthreading be turned off in the bios?
ID: 92604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HPE Belgium

Send message
Joined: 27 Mar 20
Posts: 16
Credit: 366,343,068
RAC: 11,676
Message 92606 - Posted: 30 Mar 2020, 11:01:01 UTC - in response to Message 92604.  

"Suspend when non-BOINC load ..." is off in the "Computing Preferences" in BAM

"Activity" in BAM is all set to "Always"...

I really don't know what's wrong. I use the exact same setting on all my servers. As said, Gen8 servers, even with 64 cores are fully loaded. Gen9 servers only take half...


Silly question, could hyperthreading be turned off in the bios?


There are no silly questions. But HT is enabled. Here are the other BIOS options about performance (last word is the current setting):
Intel(R) Turbo Boost Technology Default - Enabled Enabled
ACPI SLIT Default - Enabled Enabled

Node Interleaving Default - Disabled Disabled
Intel NIC DMA Channels (IOAT) Default - Enabled Enabled
HW Prefetcher Default - Enabled Enabled
Adjacent Sector Prefetch Default - Enabled Enabled
DCU Stream Prefetcher Default - Enabled Enabled
DCU IP Prefetcher Default - Enabled Enabled
QPI Snoop Configuration Default - Home Snoop Home Snoop
QPI Home Snoop Optimization Default - Directory + OSB Enabled
QPI Bandwidth Optimization (RTID) Default - Balanced Balanced
Memory Proximity Reporting for I/O Default - Enabled Enabled
I/O Non-posted Prefetching Default - Enabled Enabled
NUMA Group Size Optimization Default - Clustered Clustered
Intel Performance Monitoring Support Default - Disabled Disabled
ID: 92606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,000,634
RAC: 1
Message 92615 - Posted: 30 Mar 2020, 12:23:41 UTC - in response to Message 92603.  
Last modified: 30 Mar 2020, 12:26:58 UTC

I installed as a service, so it needs a reboot after install...



Can you create a cc_config.xml file, save it in the BOINC Data Directory (Usually C:/ProgramData/BOINC, you can check the event log for the correct path) with this, changing "N" to the numbers of Threads you want to run:


<cc_config>
<options>
<ncpus>N</ncpus>
</options>
</cc_config>


I remember someone at the WCG forums with a 32C/64T AMD CPU that was running only 32 tasks.

Once you save the file, go to BOINC-Options-Read Config Files or something like that.
ID: 92615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 272 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org