Really terrible experience of running Rosetta@Home with LHC@Home, which are racing for virtualboxing runtime

Message boards : Number crunching : Really terrible experience of running Rosetta@Home with LHC@Home, which are racing for virtualboxing runtime

To post messages, you must log in.

AuthorMessage
Profile Cyanr & Cinny
Avatar

Send message
Joined: 13 Jun 07
Posts: 2
Credit: 7,566,202
RAC: 978
Message 105421 - Posted: 12 Mar 2022, 6:02:41 UTC
Last modified: 12 Mar 2022, 6:17:20 UTC

I am running for BOINC for many years, which have very average impression to every BOINC clients and projects I joined.
However, recently, things go wrong after I have some new plan to re-schedule to usage of my tiny computing resurces:

I run Einstein@Home, Minecraft@Home, MilkyWay@home, Rosetta@Home, WCG, and LHC@Home.
My tiny toy host has intel CPU and 2 pcs of Nvidia Gfx cards

Since Minecraft and WCG are all hibernated due to their project situations, then, I decided:
1) dedicate two GPU to Einstein and MilkyWay, which means I disabled CPU tasks for them.
2) The others Rosetta and LHC share the CPU power.

Then, bad things happened, which Rosetta cannot complete the vbox tasks in time, probably 1/3 tasks are expired..
I observed, some interesting stuffs

1). I set my CPU to be 50% utilization for BOINC global preference, and, LHC seems never respect it. LHC always run in 100% of CPU power.
2). Rosetta seems to respect the CPU utilization, so it runs slower and less computing power
3). I set BOINC to switch client for every 60 minutes.

I installed some simple RRDtool graph to monitor the CPU and GPU temperatures and this can watch the busy of computing power as well.

My some immature thoughts, Rosetta and LHC are fighting and competing each other about CPU and some strange or bad behavior (or evaluation of BOINC manager will ruin Rosetta tasks.

For example, I wrote a tiny Linxu shell script to monitor the BOINC tasks, it often shows:


..........................................................................................................................................................................................................................................................................................................................
314 tasks scanned
                                                                      UR
  ID# Project        Deadline           Active         Sche uP Comp%  Dp App Ver/Task Name   
----- -------------- ------------------ -------------- ---- -- ------ -- ====================
   1) Rosetta@home   03/14/22_01:50:45  COPY_PENDING   sche 1   0.00% .. v103 aagb-NMPHE_pp-mPPS-GGLY-B3PHG_pp_0_2673182_3_0
   2) Rosetta@home   03/14/22_01:50:45  COPY_PENDING   sche 1   0.00% .. v103 aaae-ABU_pp-mPIP-AGLY-AMC14C_2856265_3_0
   3) Rosetta@home   03/14/22_01:51:31  UNINITIALIZED  pree 1  39.70% .. v103 aaam-PRO_pp-mTIC_pp-SAR-AMACBEN2_pp_4_2564856_3_0
 309) LHC@home       03/19/22_11:25:11  EXECUTING      sche 12 51.40% .. v287 mszNDmmlOm0nfZGDcpSWOuwoABFKDmABFKDmm5pPDmABFKDm6jTtQm_2
 152) MilkyWay@home  03/24/22_10:38:05  EXECUTING      sche G+ 22.25% .. v146 de_modfit_72_bundle5_3s_south_pt2_2_1646608780_4054127_0
 151) MilkyWay@home  03/24/22_10:38:06  EXECUTING      sche G+ 69.88% .. v146 de_modfit_72_bundle5_3s_south_pt2_2_1646608780_4054034_0
----------------
Scheduler state: aborted | uninitialized | preempted | scheduled  //  uP resources: 1..n CPUs | G+ nVidia GPU | g+ AMD/ATI GPU | i+ Intel GPU 
----------------
App Ver stats: LHC@h v287: 1 | M@h v146: 2 | R@h v103: 3 

======== Current Time ========
03/12/22 13:38:47 +08:00 CST
-------- System started --------
up 1 day, 21 hours, 56 minutes

=== Total Works / Allow Works (.|!|?) / Resource Share / Credits  ===
Project          TtW  A    ResShare    UsrTotal   UsrExpAvg   HostTotal  HostExpAvg
-------         ----  -  ----------  ----------  ----------  ----------  ----------
Einstein@Home    147  .          10    21605361       64040     8384331       63609
Minecraft@Home     0  .           5     9487683                 6277060            
MilkyWay@home    157  .          10     8130146       35577     5664819       35581
Rosetta@home       3  .          40     6624722         408      255979         366
WCG                0  .          20     5328161           1                        
LHC@home           7  .          15     3004775        3403      419601        3403
-------         ----  -  ----------  ----------  ----------  ----------  ----------
                                100    54180848      103429    21001790      102959
CPU thermal zone: 70° 69° 69° 73° 70° 76°| GPU thermal zone: 65° 66° | GPU fan zone: 53% 39% 



Poor Rosetta cannot beat again LHC, and it often fail to complete tasks in time....

Anybody else also observed the same things happened?
ID: 105421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 105422 - Posted: 12 Mar 2022, 8:07:11 UTC - in response to Message 105421.  

There are lots of setting over at LHC you obviously did not understand.
Hence, your ATLAS native tasks are completely misconfigured.
The major issue is that you force a 12-core setup on a 12-core CPU.
This collides with your BOINC setting where you try to reduce the #cores.

As a result other tasks - especially the GPU tasks - and the OS don't get enough CPU cycles to work fine.

You should ask at the LHC@home mb how to correctly set up ATLAS native.
ID: 105422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Really terrible experience of running Rosetta@Home with LHC@Home, which are racing for virtualboxing runtime



©2024 University of Washington
https://www.bakerlab.org