Work Units less than 100% CPU Utilization

Message boards : Number crunching : Work Units less than 100% CPU Utilization

To post messages, you must log in.

AuthorMessage
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,396
Message 89747 - Posted: 23 Oct 2018, 13:56:33 UTC

I have a number of work units that are bouncing around on CPU utilization. In some cases, I see processors drop to less than 50% utilization. They don't drop for long but I am accustomed to seeing all processors at 100% almost all the time.

I need to know if other people see the same thing. If I have an issue with my platform, I want to understand that and get it resolved.
Thx!

Paul

ID: 89747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89749 - Posted: 23 Oct 2018, 15:21:27 UTC - in response to Message 89747.  

Interesting. I had never noticed it before. But on a Win7 64-bit machine, I now see:

3.78 Rosetta Mini DSIGG152_EFGBAHCD_s011300086a3c1c99_0003_0001_fragments_fold_SAVE_ALL_OUT_700237_690_0 1 99.27%
3.78 Rosetta Mini DSIGG158_EFGHABCD_s01160002aa336a56_0001_0001_fragments_fold_SAVE_ALL_OUT_700243_690_0 12:48:55 (06:34:39) 51.33%

I have never seen it on my Ubuntu machines, but I have not looked much either.
ID: 89749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 89751 - Posted: 23 Oct 2018, 15:43:50 UTC

When less than 100% of CPU is the configured preference in the BOINC Manager, you will see CPU at 100% for a period of a few seconds, and then CPU at zero % for a few seconds. So, one would need to see how that host is configured to assess if this is expected or not.

Also, BOINC tasks run at low priority. Any time a higher priority task needs CPU, the % of CPU to BOINC will come from one or more active tasks.
Rosetta Moderator: Mod.Sense
ID: 89751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89752 - Posted: 23 Oct 2018, 16:43:21 UTC - in response to Message 89751.  
Last modified: 23 Oct 2018, 16:53:21 UTC

When less than 100% of CPU is the configured preference in the BOINC Manager, you will see CPU at 100% for a period of a few seconds, and then CPU at zero % for a few seconds. So, one would need to see how that host is configured to assess if this is expected or not.

Also, BOINC tasks run at low priority. Any time a higher priority task needs CPU, the % of CPU to BOINC will come from one or more active tasks.


I am running BOINC on an i7-4790 with 8 cores available; it is a dedicated machine with no other applications running.

Rosetta is running in one BOINC instance, with "use at most 25% of the processors" set, so running on 2 cores, and the "% of CPU time" is set to 100%. So that should not be limiting.

In the other BOINC instance, I am running 5 CPDN on 5 cores, and the last core is devoted to supporting an Einstein GPU project. So all 8 cores are accounted for.

I think the low CPU usage must be inherent in the Rosetta work unit, though maybe there is some interference from the other work units. But there should be enough cores for all.

EDIT: The Einstein GPU work unit is no longer running, and I have one core free. The second Rosetta is now up to 57%, though whether that is due to the free core is another question. My guess is that both Rosetta and CPDN take a lot of cache, and interfere with each other.
ID: 89752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,396
Message 89753 - Posted: 23 Oct 2018, 17:06:19 UTC - in response to Message 89751.  

I have run Rosy for a long time. These are dedicated crunchers set for 100% CPU Utilization. I have never seen this many tasks bounce around on utilization. I wonder if this is a larger issue with some of the new work units.
Thx!

Paul

ID: 89753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89754 - Posted: 23 Oct 2018, 17:49:14 UTC - in response to Message 89753.  

I have never seen this many tasks bounce around on utilization. I wonder if this is a larger issue with some of the new work units.

Try leaving a couple of cores free (using only 6 out of 8 for example). I think the reason I don't see it on my Ubuntu machine is because that is the way I normally run it, and also I don't (can't) run CPDN on that machine anyway.
ID: 89754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 89785 - Posted: 27 Oct 2018, 13:59:03 UTC

It looks like your 8-core machine has 32GB or memory. That should be great. However, if BOINC Manager is not configured to use it, then perhaps the BOINC Manager is struggling to keep the memory footprint within a much smaller limit. Please review how much memory BOINC is allowed to use as well.
Rosetta Moderator: Mod.Sense
ID: 89785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LarryMajor

Send message
Joined: 1 Apr 16
Posts: 22
Credit: 31,533,212
RAC: 0
Message 89786 - Posted: 28 Oct 2018, 11:38:19 UTC - in response to Message 89753.  

I have run Rosy for a long time. These are dedicated crunchers set for 100% CPU Utilization. I have never seen this many tasks bounce around on utilization. I wonder if this is a larger issue with some of the new work units.


Paul, are you seeing this on your Intel boxes, or just on the Opterons?

What happens on my Opterons, but not on the FX boxes, is that every two or three days a job’s PID will enter a sleep state and stay there. A normal client stop and restart clears it up and the job finishes normally. The workaround is quick enough that I never looked for a cause.

Also, after seeing your post, I watched things run for a while and noticed that some of the recent jobs are using a LOT of system overhead. I use htop, which splits out job and system overhead per CPU(core), but if your utility doesn’t, it will look exactly the way you described. I have to think that this is specific to a WU since I never noticed it before.
ID: 89786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,396
Message 89791 - Posted: 28 Oct 2018, 14:44:33 UTC - in response to Message 89786.  

Thx for the response. I recently did some CPU upgrades and my Opteron 6176 processors we jammed at 100% almost all the time. I am starting to wonder if there is some thermal throttling. I will check he heat sink. It also appears to happen mostly on CPU2 processors 17 - 32.

It sounds like others are not observing these issues. Guess it is time to dive back into the hardware.

Does anyone know if I need to do anything to Ubuntu 18.04 after a CPU upgrade? I assume it will load the correct CPU drivers for the new processors.

Hope to have this machine running at 100% again soon.
Thx!

Paul

ID: 89791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89793 - Posted: 28 Oct 2018, 15:03:41 UTC - in response to Message 89791.  

Does anyone know if I need to do anything to Ubuntu 18.04 after a CPU upgrade? I assume it will load the correct CPU drivers for the new processors.

It works fine for both my Ryzen 1700 and i7-8700 without anything additional.
ID: 89793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,396
Message 89796 - Posted: 28 Oct 2018, 19:21:08 UTC - in response to Message 89793.  

Looks like it was a hardware issue. I re-seated two of the CPUs and moved some RAM SIMMS around. Everything appears to be back to normal.

Thanks for all the suggestions. All 64 cores are back to 100%
Thx!

Paul

ID: 89796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Work Units less than 100% CPU Utilization



©2024 University of Washington
https://www.bakerlab.org