Message boards : Number crunching : Running multiple covid projects on multiple machines, how do you personally go about it?
Author | Message |
---|---|
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
Hello, I've got quite a few machines, for starters - over 100 threads. That may not be a lot by some standards however it can be a lot to manage. Recently, a very good friend of mine passed away from Covid related symptoms and, while it may not be definitive, I'd still like to really put the CPU power towards a bit of everything that is fighting this thing for him. Recently I've been mainly concentrating on WCG only. For those running Rosetta along with other Covid-related projects e.g. WCG, TN grid, I think there's one more I'm missing, how do you juggle which project runs on which machine? Or do you simply add them all with resource shares of 100 each and let Boinc figure out what to do from there? I'm so used to jumping all in on a single project this is a bit new to me. I guess it's also simply a matter of which parts of the science each project is contributing to, as well. When I do go in on this, I'll certainly be setting a very small cache - 0 days + 0.1 and let things sort themselves out. Is this the right way to go about this so that workunits are not lost in the shuffle? I've got everything from core 2 duos to core i5's, i7 Haswell as well as a Ryzen 1800x, 3600 and 3700x. Probably a few more in there I'm forgetting about but they are soon being upgraded to newer Ryzens, as well, when I am able. I'm just curious on opinions regarding this. I'm sure certain CPUs run better on different projects. So I'd like to make sure I'm using what I have in the best way, if that makes sense. thanks for any help! |
Ray Murray Send message Joined: 22 Apr 20 Posts: 17 Credit: 270,864 RAC: 0 |
Hi Wolfman, As different Projects are approaching the problem from different directions, to develop new drugs or test the effectiveness of existing ones, I think it's good to back all the dogs in the fight. Because of those different strategies, I don't think it's possible to determine which hardware is "better" at which Project, other than Boinc credits awarded per day or per hour, but the virus doesn't care about credits! Even with encouraging trials ongoing, I think there's still a long way to go and hopefully something even better than those candidates waiting to be discovered. With lots of hosts, the simplest way might be to have each host doing different Projects rather than setting resource shares in Boinc and letting it switch between. I have only 2 hosts active just now, both having been switched from LHC to doing purely Covid work (for an old uncle that I lost to it). This one, running 3of4 threads, is doing Rosetta, WCG and Ibercivis (which is out of work again) but rather than letting Boinc switch mid-task, I have "Switch between ...." set to 1500 mins so that any individual task will run to completion. The other one is running exclusively Folding@Home on 3of4 threads. That is not Boinc so Boinc doesn't see the threads as being used so tries to use them itself (unless restricted). I don't have anything with a useable Graphics card so haven't tried TNG so others would have to input on balancing resources re.CPU/GPU. Stay safe |
Lenciviona Send message Joined: 17 Apr 20 Posts: 1 Credit: 1,431,941 RAC: 0 |
In my case I have a bunch of old to new hardware, even an old Android tablet. I started with just running one project per machine but have changed that over time. I have a couple of older machines don't have the memory to run Rosetta, so they run just the OPN sub-project at WCG. OPN needs very little memory to run. I had troubles running Rosetta on Android, so it does just OPN now. I've seen some Rosetta tasks go well over 2GB RAM, though I think 1.5 is the listed system requirement. All the machines that can run Rosetta on, I also now run OPN. I've currently got WCG project resource set at 60% and that seems to give me roughly equal credits as reported on Free-DC Stats. I have to tinker with the resource share from time to time. I've got two pretty good machines with pretty good GPU's that I run Folding@home on. On those I reserve a couple of threads for F@h (needed for Nvdia GPUs) but I don't run F@h CPU projects. Can't really balance F@h to the other projects. I kind of assume the GPU's are doing more "work" than all my other machines combined, probably by a large factor. I devote the rest of the threads to both Rosetta and OPN. So I guess my strategy was to look as system requirements and run the highest requirement project they're capable of, and then use the resource share to balance credits between Rosseta and OPN. I've left the 'switch between' setting at default. I've noticed it will put a running task on hold to run something else for awhile, but I've never had a task not complete on time by letting it switch like that. I'm balancing between WCG(OPN) and Rosetta based on the BOINC credit system which I think that is supposed to balance comparable work loads. I don't think BOINC let's you pick a number of threads on one machine to devote to a project. So if you want to allocate a certain number of threads/machines to a particular project, then just set those machines to only have the one project you want. I kind of started that way, with some machines Rosetta only, the least machines WCG only, and the best machines fully Folding@home (with all threads doing F@h work). I don't know much about the science of these projects. If you did, you might want to focus on particular projects. I think all have value so I'm spreading my resources. But there are interesting things like Folding@home weekly sprints https://foldingathome.org/2020/07/28/introducing-covid-moonshot-weekly-sprints-help-us-discover-a-new-therapy/ designed to model/test anitviral compounds for treatment. I think that is mostly a GPU only project. Anyway, sounds like you have enough machines you can set this up however you want, though it does get more complicated if you're trying to manage and/or balance more than one project. Happy crunching! |
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
Hi Wolfman, Thank you both for the recommendations. How much memory does Ibercivis take up as far as ram usage? The other project which I am referencing can be found at TNGrid I'll definitely take a look at all of these - I'll probably add the machines with more ram to Rosetta as it does like to use at least 1 GB per thread. I don't have many GPUs, however Folding will likely be added to the machines that can use it. thanks for all the suggestions again. |
Ray Murray Send message Joined: 22 Apr 20 Posts: 17 Credit: 270,864 RAC: 0 |
Folding is currently using <100MB, on 3 CPU threads, and I think Ibercivis uses something similar but I can't confirm that just now as they currently don't have work available. There was steady work for a while but recently it has has come more intermittently, perhaps as they finish one study then move to the next. Project to Cruncher communication is as bad as it is here. Rosetta is definitely the most RAM-hungry, using anything from 400MB to >2GB. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I spent all winter upgrading machines even before COVID, and then added a couple more after it hit (11 total). They are all on Ubuntu 18.04, except my main machine on Win7 64-bit. My somewhat random thoughts:
I like short caches, and use the default of 0.1+0.5 days, except on some projects with long run times where I go to 0.25+0.5 or a little longer to ensure that they download properly. I am mainly on Rosetta (two Ryzen 3900X and two Ryzen 3950X) on almost all cores, but I am allowing WCG/ARP to run on 12 of them, so that leaves 100 for Rosetta. I do a lot of Folding too on GPUs, but am down to only one GTX 1070 at the moment until cooler weather arrives. By the way, the Ryzen 3900X and 3950X do great on the CPU version of Folding, but they are more valuable on Rosetta since I can use the GPUs on Folding. I think Folding and Rosetta do the most innovative and advanced science. However, even though they are working very hard, you won't seem them in use for the first round of anti-virals. The first round is actually in testing now, and there should be several available by the end of the year (look for the mono-clonal antibodies; they should do well). I do TN-Grid also on an i7-8700.
|
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
I spent all winter upgrading machines even before COVID, and then added a couple more after it hit (11 total). They are all on Ubuntu 18.04, except my main machine on Win7 64-bit. Thank you for that - this is exactly what I'm looking for. Is the reason you run TN grid on the 8700 due to it having better avx support? Those are some nice Ryzen processors, there I wish I could afford that, though I'm hoping to get another 3600 or 3800x soon. Paired with the two 3rd gen ryzens are now 64 gb of it, which long term should be okay, but the ram was almost more expensive than the 3800x over here, let alone the 3600. How do you run ARP on 12 threads consistently? Do you use an app_config or do you just have wcg set to 12 at once on the machines you run it on, with the resource shares for Rosetta and WCG at their default (50% each)? I don't have much for GPU's - just an rx570 for right now. What other projects are you running, CPU wise, out of curiosity? Have you tried the QuChem project, which I believe requires virtual box and is also working on Covid related science? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Good questions. Is the reason you run TN grid on the 8700 due to it having better avx support? Yes, it works very well there. Each work unit runs about 2 hours 45 minutes on the AVX version. However, I forgot that I have moved it over to a Ryzen 2700, where it takes 3 hours 13 minutes. But I need the i7-8700 on MLC@Home, a new machine-learning project that also works very well on the i7-8700. But TN-Grid would work very nicely on a Ryzen 3600, since it has improved AVX support also. Maybe this fall when I start up another machine I can put it there. Those are some nice Ryzen processors, there I wish I could afford that, though I'm hoping to get another 3600 or 3800x soon. Paired with the two 3rd gen ryzens are now 64 gb of it, which long term should be okay, but the ram was almost more expensive than the 3800x over here, let alone the 3600. I have a couple of Ryzen 3600s and like them a lot. Their big cache is a help on some projects. I personally would not spend more for the 3800X. How do you run ARP on 12 threads consistently? Do you use an app_config or do you just have wcg set to 12 at once on the machines you run it on, with the resource shares for Rosetta and WCG at their default (50% each)? I use an app_config.xml to limit the ARP to three on each machine. And then I set the resource share for WCG to 10% (while leaving it at 100% for Rosetta). That is not exactly correct, but will do for the moment. I don't have much for GPU's - just an rx570 for right now. I have been leaning to CPUs for my most recent purchaes, since there are more projects there and the new Ryzens do very well on them. What other projects are you running, CPU wise, out of curiosity? Have you tried the QuChem project, which I believe requires virtual box and is also working on Covid related science? Yes, I run QuChemPedia (the long work units) on an i7-9700, which has 8 full cores. Since I am on Linux, I don't need to use VirtualBox. It runs very well. I also run WCG/MCM on six cores of my Window 7 machine, which is an Intel i7-4771. It runs even better on Windows than Linux, which is a bit unusual. On my Ryzen 2700, I run Universe (nice astrophysics I think) and now TN-Grid, as noted above. They run well together, something I always check for. Also, I have two cores of that machine on nanoHUB, which requires VirtualBox. It has short work units and a high error rate, and may not be everyone's cup of tea, but I think it is worth a little support. On a Ryzen 3600, I am running 8 cores on LHC, and 4 cores on WCG/MIP. That is because the MIP are a little temperamental, and I have to limit the number I run, depending on the cache size. The LHC includes both native ATLAS and CMS. The native ATLAS requires both "CVMFS" and "Singularity", which will take some study to install, but there are instructions on their forums. Or you can just use VirtualBox, though it is not quite as fast. The CMS does not have a native version, and runs only on VirtualBox. I have another Ryzen 3600 and a Ryzen 3700X shut down for the summer. I am hoping that BlackHoles@Home is ready by the fall, but it has been under development for some time and it may be a while longer. http://astro.phys.wvu.edu/bhathome/ I am not sure about the Ryzen 3700X. I used it for Rosetta before getting the bigger chips, and may have to look around a little to find a good use for it. Have fun. If the COVID is not disrupting your life (I am retired), it is a great time to upgrade. PS - I forgot to mention GPU-Grid. It uses CUDA, and so runs on Nvidia only. I have a GTX-1060 on it. They do very nice biomedical work, though not specifically COVID at the moment. That is OK with me. And I have an RX 570 too, on my Windows machine. I use it on Einstein (the gravity wave work units) in the cooler weather. It is also a project that runs well on Windows, and the AMD card does much better than Nvidia there. You never know until you check. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 8,919,838 RAC: 1,006 |
Wolfman said:
Another Project running Covid-19 stuff is World Community Grid, the Open Pandemics sub-project. I also have over 100 cpu cores, I'm pushing 200 now in 20 pc's, and I usually setup each Project on every pc and then chose which one to run on the machine that fits it best, I then set it to No new Tasks or Suspend those Projects I don't want it to run on that pc. I always setup the default venue at each project with a zero resource share so I only get enough to run one task on each cpu core with no cache of extra units. That way I know if a Project works on that pc and I can see if it's a good fit or not. I also setup a 2nd venue at each Project with enough of a resource share to give me enough workunits to last a day or so since I have always on cable internet and it gives me a chance to change my mind without deleting 500 tasks, I usually use a resource share of 5 or 10 and set my cache at 0.5 and 0.5 so I end up with about 1 days worth of work. After knowing the units work well on that pc I move the pc to the 2nd venue so it has a cache of workunits and I just let it run. I do check all of my pc's everyday, but that's mostly because I have the time and have VNC setup on each so I can do it from my recliner. |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 19,832,430 RAC: 5,730 |
I’m running 12 raspberry pis and a couple of desktops. One thing I can tell you that doesn’t seem to be mentioned here is running multiple projects on the same host reduces performance on both projects, at least on raspberry pi. Ex: splitting a pi to run 2 threads of open pandemics and 2 threads of of Rosetta will yield about 450 average credit on Rosetta. Putting all 4 threads on Rosetta will yield around 1100 average credit. The same thing happens on open pandemics. Given this, I’ve now switched to run only one project per pi. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 8,919,838 RAC: 1,006 |
I’m running 12 raspberry pis and a couple of desktops. One thing I can tell you that doesn’t seem to be mentioned here is running multiple projects on the same host reduces performance on both projects, at least on raspberry pi. That could be the cache each Project loads for the workunits, doing it for just one Project could be more efficient than trying to do it for two or more. But that's also why alot of people have a bunch of pc's, to let them run multiple Projects at the same time or to hyper focus on one Project. |
bozz4science Send message Joined: 2 May 20 Posts: 7 Credit: 228,784 RAC: 0 |
Just my 2 cents. Personally I have far less compute power and at most 20 threads of mostly very old CPUs to share at any time. Due to the current heat wave I was forced to temporarily pause/reduce my contribution as my small apartment immediately turns into a hostile environment when running them 24/7 at the current temps. Still, I like to think that my contribution just adds up in the end so it still counts. As I have far fewer machines, there is far less to manage, so I cannot add any value here. Usually I dedicate my older GPU to Folding@Home. The CPUs run on multiple projects with small caches. I like to diversify my Covid-related projects as well as I think each one has a valid approach but some might yield more immediate kind-of results than others. I run OPN at WCG, even though I am also anticipating the start of their GPU client, as well as TN-Grid, Rosetta and Ibercivis. What I see so far is, that many of the aforementioned projects do not only conduct Covid-related research and thus increasingly send out more Covid-unrelated work. To my understanding only OPN and Ibercivis have dedicated their complete WU-pipeline to Covid-only tasks. I believe that in the end all research that these projects conduct is valuable, so I don't mind that only a fraction of the completed work might have this desired link to Covid-research. In the end it all comes down to personal preference and one own's knowledge to judge the potential benefit of each project. As I don't have a well enough understanding in medical chemistry/immunology and virology, I split my resources among these projects. However, I think that Ibercivis and OPN mainly try to find a short-term antiviral candidate by testing synthetically created and existing drugs to repurpose them for Covid. This might yield more immediate opportunities as a broadly deployable and low-cost therapeutic. TN-Grid tried by extending the gene networks related to Covid-involved genes (f.ex. ACE2) to find potentially interesting genetic links to the susceptibility and severity of the disease. And lastly, Rosetta and Folding perform structural Covid-related protein prediction to find interesting binding sites for therapeutics and provide insights for further vaccine research efforts. While Rosetta is concerned about the stable final form the amino sequence folds into, Folding@Home is also interested in the folding process itself and models the atomic dynamics of molecular structures as well. Among the most notable efforts at F@H is the ongoing Covid-Moonshoot project you can read up on the F@H website. And if I am not mistaken GPU.Grid also ran some Covid WUs as well. Earlier this year Boinc@TACC (currently #8 supercomputer in the world) also ran a larger batch of Covid-tasks but only has work sporadically. In the end I think each projects complements each other even if there might be some redundancies, but it is still only the tip of the iceberg along the ongoing race for a vaccine. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Just my 2 cents. I think that is a good summary of the projects, and I am perfectly happy with more non-COVID work. Here is some interesting info on a new approach that was cited on the Folding forums. (Note that John Chodera is one of their lead investigators and works on Project Moonshot, a large international collaboration that is doing great things). https://foldingforum.org/viewtopic.php?f=17&t=35964 |
wolfman1360 Send message Joined: 18 Feb 17 Posts: 72 Credit: 18,450,036 RAC: 0 |
Thank you for such valuable input, everyone! It is very much appreciated. I've had to switch off some of my older machines, too - in particular a dual Opteron 6128. I will more than likely attempt the nanoHUB project as well as a few others Jim1348 mentioned. I was giving all my resources to MLC previously, since it was a new project and I really like the- owner gives a lot of communication in regards to the status and tries to help wherever he can. LHC - I should really give that a try again. Seems I had some issues the last time I attempted it. Thanks again, everyone. Right now I just have everything connected to this project but I'll slowly start branching out in the future, especially when WCG comes out with GPU work. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1978 Credit: 9,194,012 RAC: 3,787 |
I will more than likely attempt the nanoHUB project as well as a few others Jim1348 mentioned. I like Nanohub, but has some problems with wus I was giving all my resources to MLC previously, since it was a new project and I really like the- owner gives a lot of communication in regards to the status and tries to help wherever he can MLC started very well, with the affiliation to an University. And app is very stable!! LHC - I should really give that a try again. Seems I had some issues the last time I attempted it. Which sub-project? Sixtrack? Atlas?? Right now I just have everything connected to this project but I'll slowly start branching out in the future, especially when WCG comes out with GPU work. They started the alpha test for gpu app. Waiting for stable app... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1229 Credit: 14,172,067 RAC: 1,295 |
I've decided that TN Grid should get a lower resource share than the others - it just doesn't have as much server capacity as WCG or Rosetta@home. I've read that https://boinc.ibercivis.es/ibercivis/ is another such project, but it appears to be set up mainly for those who can follow instructions written in Spanish. If your are interested in non-BOINC distributed computing projects, I've found two doing COVID-19 work: Folding@home main - CPU and GPU work https://foldingathome.org/ Folding@Home Forum (separate account creation required) https://foldingforum.org/index.php Quarantine@Home (GPU only for Linux) https://quarantine.infino.me/ |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 8,919,838 RAC: 1,006 |
wolfman1360 wrote: I will more than likely attempt the nanoHUB project as well as a few others Jim1348 mentioned. I was giving all my resources to MLC previously, since it was a new project and I really like the- owner gives a lot of communication in regards to the status and tries to help wherever he can. There is a problem with some NanoHub tasks in the version of Virtual Box that's being used, not all tasks are valid when using more than version 5.2.8, the newer version 6 VB files are different and are causing a problem for some machines and setups. |
Message boards :
Number crunching :
Running multiple covid projects on multiple machines, how do you personally go about it?
©2024 University of Washington
https://www.bakerlab.org