Posts by Cyanr & Cinny

1) Message boards : Number crunching : No more work generated and dispatched, R@H will shutdown for maintaining? (Message 106932)
Posted 17 Sep 2022 by Profile Cyanr & Cinny
Post:
Hi there

Not quite sure what happened here and no news update to explain that.

Or do I miss something?
2) Message boards : Number crunching : Really terrible experience of running Rosetta@Home with LHC@Home, which are racing for virtualboxing runtime (Message 105421)
Posted 12 Mar 2022 by Profile Cyanr & Cinny
Post:
I am running for BOINC for many years, which have very average impression to every BOINC clients and projects I joined.
However, recently, things go wrong after I have some new plan to re-schedule to usage of my tiny computing resurces:

I run Einstein@Home, Minecraft@Home, MilkyWay@home, Rosetta@Home, WCG, and LHC@Home.
My tiny toy host has intel CPU and 2 pcs of Nvidia Gfx cards

Since Minecraft and WCG are all hibernated due to their project situations, then, I decided:
1) dedicate two GPU to Einstein and MilkyWay, which means I disabled CPU tasks for them.
2) The others Rosetta and LHC share the CPU power.

Then, bad things happened, which Rosetta cannot complete the vbox tasks in time, probably 1/3 tasks are expired..
I observed, some interesting stuffs

1). I set my CPU to be 50% utilization for BOINC global preference, and, LHC seems never respect it. LHC always run in 100% of CPU power.
2). Rosetta seems to respect the CPU utilization, so it runs slower and less computing power
3). I set BOINC to switch client for every 60 minutes.

I installed some simple RRDtool graph to monitor the CPU and GPU temperatures and this can watch the busy of computing power as well.

My some immature thoughts, Rosetta and LHC are fighting and competing each other about CPU and some strange or bad behavior (or evaluation of BOINC manager will ruin Rosetta tasks.

For example, I wrote a tiny Linxu shell script to monitor the BOINC tasks, it often shows:


..........................................................................................................................................................................................................................................................................................................................
314 tasks scanned
                                                                      UR
  ID# Project        Deadline           Active         Sche uP Comp%  Dp App Ver/Task Name   
----- -------------- ------------------ -------------- ---- -- ------ -- ====================
   1) Rosetta@home   03/14/22_01:50:45  COPY_PENDING   sche 1   0.00% .. v103 aagb-NMPHE_pp-mPPS-GGLY-B3PHG_pp_0_2673182_3_0
   2) Rosetta@home   03/14/22_01:50:45  COPY_PENDING   sche 1   0.00% .. v103 aaae-ABU_pp-mPIP-AGLY-AMC14C_2856265_3_0
   3) Rosetta@home   03/14/22_01:51:31  UNINITIALIZED  pree 1  39.70% .. v103 aaam-PRO_pp-mTIC_pp-SAR-AMACBEN2_pp_4_2564856_3_0
 309) LHC@home       03/19/22_11:25:11  EXECUTING      sche 12 51.40% .. v287 mszNDmmlOm0nfZGDcpSWOuwoABFKDmABFKDmm5pPDmABFKDm6jTtQm_2
 152) MilkyWay@home  03/24/22_10:38:05  EXECUTING      sche G+ 22.25% .. v146 de_modfit_72_bundle5_3s_south_pt2_2_1646608780_4054127_0
 151) MilkyWay@home  03/24/22_10:38:06  EXECUTING      sche G+ 69.88% .. v146 de_modfit_72_bundle5_3s_south_pt2_2_1646608780_4054034_0
----------------
Scheduler state: aborted | uninitialized | preempted | scheduled  //  uP resources: 1..n CPUs | G+ nVidia GPU | g+ AMD/ATI GPU | i+ Intel GPU 
----------------
App Ver stats: LHC@h v287: 1 | M@h v146: 2 | R@h v103: 3 

======== Current Time ========
03/12/22 13:38:47 +08:00 CST
-------- System started --------
up 1 day, 21 hours, 56 minutes

=== Total Works / Allow Works (.|!|?) / Resource Share / Credits  ===
Project          TtW  A    ResShare    UsrTotal   UsrExpAvg   HostTotal  HostExpAvg
-------         ----  -  ----------  ----------  ----------  ----------  ----------
Einstein@Home    147  .          10    21605361       64040     8384331       63609
Minecraft@Home     0  .           5     9487683                 6277060            
MilkyWay@home    157  .          10     8130146       35577     5664819       35581
Rosetta@home       3  .          40     6624722         408      255979         366
WCG                0  .          20     5328161           1                        
LHC@home           7  .          15     3004775        3403      419601        3403
-------         ----  -  ----------  ----------  ----------  ----------  ----------
                                100    54180848      103429    21001790      102959
CPU thermal zone: 70° 69° 69° 73° 70° 76°| GPU thermal zone: 65° 66° | GPU fan zone: 53% 39% 



Poor Rosetta cannot beat again LHC, and it often fail to complete tasks in time....

Anybody else also observed the same things happened?






©2024 University of Washington
https://www.bakerlab.org