Problems and Technical Issues with Rosetta@home

Author	Message
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,155,087 RAC: 12,086	Message 107517 - Posted: 21 Oct 2022, 0:14:19 UTC - in response to Message 107516. Last modified: 21 Oct 2022, 0:14:45 UTC I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 Did you do anything special to combine cards on the project? ID: 107517 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0	Message 107518 - Posted: 21 Oct 2022, 1:04:07 UTC - in response to Message 107516. I would like to see what actual use is in resource/ task monitor or the like ID: 107518 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107519 - Posted: 21 Oct 2022, 7:35:28 UTC - in response to Message 107517. I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 Did you do anything special to combine cards on the project? No...everything is standard. I don't mess around with stuff like that. All projects are default. ID: 107519 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107520 - Posted: 21 Oct 2022, 7:36:17 UTC - in response to Message 107518. I would like to see what actual use is in resource/ task monitor or the like When MOO is up to run again, I will grab a screen shot. Right now Einstein and Prime are running. ID: 107520 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,155,087 RAC: 12,086	Message 107521 - Posted: 21 Oct 2022, 7:45:23 UTC - in response to Message 107519. I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 Did you do anything special to combine cards on the project? No...everything is standard. I don't mess around with stuff like that. All projects are default. I jsut tried Moo on a computer with a Tahiti and a Fury (both AMD, not too far apart. Tahiti is 3GB 4tflops SP, Fury is 4GB 8tflops SP.) But I got tasks for 1 AMD at a time, maybe it's a Cuda thing. I've asked here: https://moowrap.net/forum_thread.php?id=647#8359 ID: 107521 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107525 - Posted: 21 Oct 2022, 11:08:27 UTC Last modified: 21 Oct 2022, 11:23:00 UTC And then FAH GPU But what is interesting is Windows shows only 20% usage of the GPU but if you look at MSI Afterburner it shows 98% A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel. ID: 107525 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0	Message 107528 - Posted: 21 Oct 2022, 19:04:05 UTC Not the output I was expecting from task mangler , and leaves me baffled , Afterburner looks to be telling it as it is . ID: 107528 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 22 Feb 11 Posts: 244 Credit: 441,919 RAC: 435	Message 107529 - Posted: 21 Oct 2022, 19:43:35 UTC - in response to Message 107528. Try switching to performance tab, down to gpu and press on title of one of the graphs There should be Cuda graph. It doesn't show up on my screenshot because i have Hardware accelerated gpu memory scheduler enabled. ID: 107529 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107530 - Posted: 21 Oct 2022, 20:44:40 UTC - in response to Message 107529. Last modified: 21 Oct 2022, 20:50:13 UTC I did have a look at the time. They were mirrored just as it says in the processes. The text below all the images I posted explains it. A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel. ID: 107530 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107531 - Posted: 21 Oct 2022, 20:55:27 UTC Last modified: 21 Oct 2022, 20:56:43 UTC So I am showing you the 1080 startup and then the 1050 running So as you see the copy box is active on both. Again...refer to the text in the previous post. ID: 107531 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 22 Feb 11 Posts: 244 Credit: 441,919 RAC: 435	Message 107532 - Posted: 21 Oct 2022, 20:57:42 UTC - in response to Message 107531. What other graphs does it support? ID: 107532 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107533 - Posted: 21 Oct 2022, 23:17:23 UTC - in response to Message 107532. What other graphs does it support? What do you mean? ID: 107533 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107534 - Posted: 21 Oct 2022, 23:20:30 UTC Last modified: 21 Oct 2022, 23:21:21 UTC There are generally two ways to distribute computation across multiple devices: Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc. Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches. This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training. Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your code, in the following two setups: On multiple GPUs (typically 2 to 8) installed on a single machine (single host, multi-device training). This is the most common setup for researchers and small-scale industry workflows. On a cluster of many machines, each hosting one or multiple GPUs (multi-worker distributed training). This is a good setup for large-scale industry workflows, e.g. training high-resolution image classification models on tens of millions of images using 20-100 GPUs. More at: https://keras.io/guides/distributed_training/ ID: 107534 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 22 Feb 11 Posts: 244 Credit: 441,919 RAC: 435	Message 107535 - Posted: 21 Oct 2022, 23:47:50 UTC - in response to Message 107534. Last modified: 21 Oct 2022, 23:48:31 UTC There should be a graph called cuda. For some reason task manager doesn't count it at processes tab. ID: 107535 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1226 Credit: 14,125,667 RAC: 1,804	Message 107536 - Posted: 22 Oct 2022, 0:25:47 UTC - in response to Message 107534. [snip] There are generally two ways to distribute computation across multiple devices: Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc. Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches. This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training. NVIDIA GPUs work best with models that use few branches. The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp. Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them. I've found little information of whether this also happens in AMD GPUs. Dividing the work between two or more GPUs should work if the portion of the work on one GPU does not need to exchange more than a rather small amount of information with any other GPU, provided that the application can divide the work properly which means that the division must be written into the application rather than expecting it to happen automatically. Multithreaded CPU applications can use multiple CPU cores at once if the application is written to allow this. The main restriction on these is that no two virtual cores within a physical core can execute an instruction at the same time. However, main memory speed is usually such that any virtual core waiting on a main memory access will not have to wait any longer if another virtual core for that physical core can get its inputs from a cache instead of main memory. ID: 107536 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,155,087 RAC: 12,086	Message 107537 - Posted: 22 Oct 2022, 2:22:46 UTC - in response to Message 107536. The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp. Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them. I've found little information of whether this also happens in AMD GPUs. Are warps equivalent to wavefronts? https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505 ID: 107537 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1226 Credit: 14,125,667 RAC: 1,804	Message 107538 - Posted: 22 Oct 2022, 2:35:10 UTC - in response to Message 107537. I've found little information of whether this also happens in AMD GPUs. Are warps equivalent to wavefronts? https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505 I read that, and found it confusing about whether they are or not. ID: 107538 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5664 Credit: 5,847,457 RAC: 1,057	Message 107539 - Posted: 22 Oct 2022, 8:27:22 UTC So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done? ID: 107539 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,155,087 RAC: 12,086	Message 107541 - Posted: 22 Oct 2022, 9:11:34 UTC - in response to Message 107539. So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done? Sounds like it. Might aswell use both for one task if they can, then the task is completed in half the time. I'm wondering about the efficiency though. I get one done in 11 minutes on a single Radeon Fury (8Tflops SP), and it uses NO cpu at all. You're taking 9 minutes (so 1.2 times faster) on 2 cards totalling 1.5 times the power, and using two CPU cores aswell. Maybe NVidia are just rubbish. It could be their total lack of DP support, so yours is having to use the CPU for those bits. ID: 107541 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 4 May 07 Posts: 355 Credit: 382,349 RAC: 0	Message 107542 - Posted: 22 Oct 2022, 13:34:07 UTC - in response to Message 107541. It could be their total lack of DP support, so yours is having to use the CPU for those bits. All Geforce 10X0 GPUs have DP and Moo! doesn't need it at all. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing. BTW, use GPU-Z to check GPU usage, not task manager, that's obviously useless for that, the GPU should be near 100% when Moo! is running. . ID: 107542 · Rating: 0 · rate: / Reply Quote