GPU WU's

Author	Message
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 92520 - Posted: 29 Mar 2020, 11:22:06 UTC - in response to Message 92511. Last modified: 29 Mar 2020, 12:06:52 UTC Porting that one without breaking code will not be easy. I'm totally with you regarding runtime and it being worth the effort. But this one calls for full time developers with formal training, not scientists doing development on the side. I'm not sure how much man-power Rosetta actually has to do this and I'm also not sure if the commercial side of Rosetta has an interest in doing this. Have you seen the posts by rjs5? He is an expert on parallelism (AVX, etc.) and has been trying to help them along for years, but it is slow progress. ID: 92520 · Rating: 0 · rate: / Reply Quote

Falconet Send message Joined: 9 Mar 09 Posts: 355 Credit: 1,669,337 RAC: 0	Message 92521 - Posted: 29 Mar 2020, 11:25:38 UTC Read this regarding GPU work: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13533&postid=92291 ID: 92521 · Rating: 0 · rate: / Reply Quote

Laurent Send message Joined: 15 Mar 20 Posts: 14 Credit: 88,800 RAC: 0	Message 92531 - Posted: 29 Mar 2020, 14:42:33 UTC - in response to Message 92520. Have you seen the posts by rjs5? He is an expert on parallelism (AVX, etc.) and has been trying to help them along for years, but it is slow progress. Yes, i have seen them and the history (the no SSE thread in 2015) makes me shiver. They could push though a lot more data just using a competent compiler. Who knows how fast this thing would be on a good platform. I'm a OpenCL developer and have already done such ports of scientific code. I offered help, they declined politely. That's life. ID: 92531 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 92535 - Posted: 29 Mar 2020, 15:42:50 UTC - in response to Message 92531. Last modified: 29 Mar 2020, 16:00:36 UTC I'm a OpenCL developer and have already done such ports of scientific code. I offered help, they declined politely. That's life. Thanks for your efforts. But don't feel singled out. They refuse everyone. Maybe they have reasons; I am not a programmer. PS - Good luck on TN-Grid. I do that one too. They were making progress for a while, and then stopped. PPS: Do you know about QChemPedIA? It is a new project. I don't know if it is suitable for your efforts, but they have said that the open-source package they are using is ten times slower than the proprietary one they use in-house, so there could be potential. https://quchempedia.univ-angers.fr/athome/ ID: 92535 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 30,949,009 RAC: 628	Message 93157 - Posted: 3 Apr 2020, 5:17:26 UTC - in response to Message 92497. Last modified: 3 Apr 2020, 5:52:26 UTC As a result modern GPU only just few times faster compared to modern CPUs. And only on task well suitable for highly parallel SIMD computation. On tasks non well suitable for such way of computation it can be even slower compared to CPUs. Actually, the facts say otherwise. Seti@home data is broken up in to WUs that are processed serially, perfect for CPUs. Over time they made use of volunteer developers, and applications that made use of SSSEx.x and eventually AVX were developed- the AVX application being around 40% faster than the early stock application (depending on the type of Task being done). Then they started making use of GPUs, and guess what? It is possible to break up some types of work that is normally processed serially to do parallel processing, then re-combine the parallel produced results to give a final result that matches that produced by the CPU application, and it does it in much less time. For example- using the final Special Application for LINUX for Nvidia GPUs (as it uses CUDA) a particular Task on a high-end CPU will take around 1hr 30min (all cores, all threads and using the AVX application which is 40% faster than the original stock application). The same task on a high-end GPU is done in less than 50 seconds. And the GPU result matches that of the CPU- a Valid result. Personally, i think going from 90min to 50 seconds to process a Task is a significant improvement. Think of how much more work could be done each hour, day, week. It just rubbish rather than facts. Either you misunderstood / counted something wrong. For example, you probably took the runtime of WU on 1 thread / core of the CPU (and not the whole processor), and compare it with the runtime of a job using an entire GPU (plus some from CPU as all GPU apps do) . Or the programmers of SETI are completely unable to use modern CPUs normally. Only 40% boost form AVX compared with plain app without any SIMD is pity and puny: on a code/tasks suitable for vectorization it should gain 3x / +200% speed or more, and if code/task is NOT suitable for vectorization gain can be low but such tasks can not work effectively on GPU at all. Because both GPU programming and CPU SIMD programming needs the same (because all current GPUs cores are wide SIMD engines inside), but SIMD for CPU is simpler to implement. current high end CPUs: Intel Core i9-9900k is capable of ~450 GFLOPS with dual precision or 900 GPLOPS at single precision calculations AMD Ryzen 9 3950X: ~900 GFLOPS DP and 1800 SP. AMD Threadripper 3990X: 2900 GFLOPS DP and 5800 SP. Peak speeds of few current high end GPUs AMD VEGA 64 = 638 GFLOPS DP and 10215 SP AMD RX 5700 XT = 513 GFLOPS DP and 8218 SP NVidia RTX 2080 = 278 GFLOPS DP and 8 920 SP NVidia RTX 2080 Ti = 367 GFLOPS DP and 11750 SP And it’s much easier to get real app speed closer to the theoretical maximum on the CPU than on the GPU. And all GPU computation also need additional support/use of resources from CPU to run. Both facts reducing speed gap even further as we move from theoretical potential (shown above) to practical computing. As i said: modern GPU only few times faster compared to modern CPUs, not ~100x (if you properly use all cores and SIMD extensions). And only if used for single precision calculations. On dual precisions GPUs usually even slower compared to CPUs at least for all "consumer grade" GPUs (there are special versions of GPUs for data centers and supercomputers with high DP speeds like NV Tesla or AMD Instinct, but they priced few times more compared to consumer/gamer counterparts GPUs and usually not sold to retail customer at all) . ID: 93157 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1937 Credit: 18,534,891 RAC: 0	Message 93162 - Posted: 3 Apr 2020, 6:06:31 UTC - in response to Message 93157. Last modified: 3 Apr 2020, 6:07:59 UTC For example, you probably took the runtime of WU on 1 thread / core of the CPU (and not the whole processor), and compare it with the runtime of a job using an entire GPU. Of course, that is how you compare things. You compare things, that are comparable... And it takes a lot of CPU threads to match the output of 1 GPU. A lot of extremely slow processing units can match the output of a single high performance processing unit- if you have enough of them, and when it comes to CPUs v GPUs- it takes up 100 CPU processing units (Cores/threads) to match the output of an equivalent GPU (ie low end v low end, high end v high end). And it’s much easier to get real app speed closer to the theoretical maximum on the CPU than on the GPU. Yep, and still a GPU is capable of significantly greater processing rates than a CPU. Having lots of cores in a CPU helps offset it's poor capabilities, but then adding GPUs helps improve their output as well (checkout the hardware used in the current & future crop of Supercomputers. Real life facts, not from the world you live in, but actual facts. Reality). And all GPU computation need support/use of resources from CPU to run. Yep, and in every case the loss of the CPU output is more than offset by the increase in output the GPU provides. As i said: modern GPU only few times faster compared to modern CPUs (if you properly use all cores and SIMD extensions). The processing time of a CPU core is much greater than the processing time of a single GPU- withboth applications optimised for maximum output. Adding more cores to the CPU improves it's output, but then adding more GPUs to a system improves it's output as well. And only if used for single precision calculations. On dual precisions GPUs usually even slower compared to CPUs at least for all "consumer grade" GPUs AMD consumer GPUs have much higher DP (Double Precision) capabilities than NVidia. (there are special versions of GPUs for data centers and supercomputers with high DP speeds like NV Tesla or AMD Instinct, but they priced few times more compared to consumer/gamer counterparts GPUs and usually not sold to retail customer at all) . So what? The fact is they still well exceed a CPU's capabilities. Of course you could process data that cannot in any way be parallelised, in which case then yes- a CPU (low, mid, highend) can out perform a GPU (low, mid, highend). But for work that can be done in parallel, GPUs win every time (with a well developed application of course, comparing an extremely optimised application with an extremely poorly written one isn't a valid comparison). Grant Darwin NT ID: 93162 · Rating: 0 · rate: / Reply Quote

dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0	Message 93365 - Posted: 4 Apr 2020, 11:28:18 UTC Last modified: 4 Apr 2020, 11:30:12 UTC Here's a very different argument to try and explain why Rosetta doesn't have GPU WU. To quote Spock from the original Star Trek show: "You're proceeding from a false assumption." That assumption is that any program can be converted to run on a GPU and will go faster if that happens. OK, lets assume that's correct. If it were, Intel and AMD would go out of business tomorrow, because we wouldn't need them any more. We'd just stop using conventional CPUs and run everything on the GPU instead: OS, Browser, the whole lot. But we don't do that, do we. Why not? Because there are some things that GPUs just don't do well at. Read this Q & A from the Computer Science Stack Exchange website: https://cs.stackexchange.com/questions/121080/what-are-gpus-bad-at It does a really really good job of explaining what GPUs are good at, and what they are bad at. And it just so happens that while Seti, Folding and others can be made efficient on a GPU, Rosetta can't. So next time you're asking why Rosetta isn't on your GPU, ask yourself why your browser doesn't run on your GPU. The answer to both those questions is about the same. ID: 93365 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 93389 - Posted: 4 Apr 2020, 15:27:33 UTC - in response to Message 93365. Rosetta can't. Not sure "can't" is the perfect word there, but certainly not trivial to get there. Rosetta Moderator: Mod.Sense ID: 93389 · Rating: 0 · rate: / Reply Quote

gerardgn Send message Joined: 5 Apr 20 Posts: 1 Credit: 168,552 RAC: 0	Message 93647 - Posted: 6 Apr 2020, 16:17:15 UTC I also have a NVIDIA Jetson nano. The CPU is a low end (ARM A57 ) but the GPU is quite huge ( ~128 cores Maxwel) => ~450 Gflops. Is it possible to enhance the client to take advantage of NVIDIA GPU like F@H for PC? It will unleash more power as I see we are few with this devices. ID: 93647 · Rating: 0 · rate: / Reply Quote

Millenium Send message Joined: 20 Sep 05 Posts: 68 Credit: 184,283 RAC: 0	Message 93668 - Posted: 6 Apr 2020, 19:31:41 UTC No, Rosetta@Home is a CPU only project. Not every problem is well suited for a GPU. So well, use Folding for your GPU and Rosetta for the CPU. ID: 93668 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2193 Credit: 13,720,774 RAC: 358	Message 93678 - Posted: 6 Apr 2020, 20:54:25 UTC - in response to Message 93668. No, Rosetta@Home is a CPU only project. Rosetta@Home is cpu only. But trRosetta is also for gpu. ID: 93678 · Rating: 0 · rate: / Reply Quote

markhl Send message Joined: 18 Feb 22 Posts: 1 Credit: 2,508 RAC: 0	Message 105115 - Posted: 22 Feb 2022, 3:59:20 UTC - in response to Message 93678. Hello! I am new to R@h. I started running R@h because WCG ran out of units due to their migration. Not running any other BOINC program. I'd welcome your thoughts. This thread states that R@h only uses the CPU. But my BOINC event log shows GPU usage: 2/21/2022 7:30:24 PM \| \| Resuming GPU computation 2/21/2022 7:32:28 PM \| \| Suspending GPU computation - computer is in use So, it looks like R@h does use GPU now? ID: 105115 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1937 Credit: 18,534,891 RAC: 0	Message 105116 - Posted: 22 Feb 2022, 7:07:55 UTC - in response to Message 105115. So, it looks like R@h does use GPU now? Nope. It looks like if you have a GPU, and you have it set to suspend BOPINC processing under certain conditions, then when those conditions occur BOINC makes note of that in the log- even though the GPU isn't actually being used. Grant Darwin NT ID: 105116 · Rating: 0 · rate: / Reply Quote

Bryn Mawr Send message Joined: 26 Dec 18 Posts: 442 Credit: 15,697,820 RAC: 351	Message 105138 - Posted: 22 Feb 2022, 19:28:26 UTC - in response to Message 91505. I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU. Years ago they tried a gpu app of R@H (so i think is possible to do, even if limited to some protocols), with little benefits. But during these years a lot of things changed, like HW and SW, so i don't know if benefits are bigger now. If the algorithm has not changed then no amount of change to the hw and sw will make any difference to the benefits. ID: 105138 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 105254 - Posted: 27 Feb 2022, 0:27:18 UTC - in response to Message 105138. I am just saying that it is not safe to presume that since project X has done some sort of protein structure prediction on a GPU, that R@h, and the algorithms it uses for the various sorts of predictions, would see a similar performance boost from GPU. Years ago they tried a gpu app of R@H (so i think is possible to do, even if limited to some protocols), with little benefits. But during these years a lot of things changed, like HW and SW, so i don't know if benefits are bigger now. If the algorithm has not changed then no amount of change to the hw and sw will make any difference to the benefits. The short version from my memory is their code is not designed for GPU use and that they are constantly changing the parameters or other things in each of the proteins they were sending out in 4.2. This project does not like change, so they don't bother writing code for GPU. Plus...with their neural network they have all the GPU power they need for deep machine learning. What the story these days is for resistance to GPU usage, I don't know. But they have a hard enough time keeping CPU work straight sometimes. So GPU would be really unreliable if they released it here. To sum it up, RAH has never been GPU and WILL never be GPU at least for the next 5 years if not longer. ID: 105254 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 105255 - Posted: 27 Feb 2022, 0:28:04 UTC - in response to Message 93678. No, Rosetta@Home is a CPU only project. Rosetta@Home is cpu only. But trRosetta is also for gpu. That does not appear to be BOINC related? It appears to be its own stand alone app. ID: 105255 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2193 Credit: 13,720,774 RAC: 358	Message 105278 - Posted: 28 Feb 2022, 8:24:26 UTC - in response to Message 105255. But trRosetta is also for gpu. That does not appear to be BOINC related? It appears to be its own stand alone app. Yes, it's stand-alone, but it's open and is part of IPD/Rosettacommons/Rosetta software ecosystem. Another software of IPD is RoseTTAFold, suitable on GPU. ID: 105278 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2193 Credit: 13,720,774 RAC: 358	Message 105282 - Posted: 28 Feb 2022, 10:04:51 UTC - in response to Message 105255. That does not appear to be BOINC related? It appears to be its own stand alone app. This tweet from official R@H account said that TrRosetta is inside our VM's wus. ID: 105282 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2193 Credit: 13,720,774 RAC: 358	Message 105283 - Posted: 28 Feb 2022, 10:18:06 UTC - in response to Message 105282. TrRosetta is running also with Pythorch Pythorch runs well on Nvidia and Amd gpu.... ID: 105283 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2193 Credit: 13,720,774 RAC: 358	Message 105285 - Posted: 28 Feb 2022, 10:19:41 UTC - in response to Message 93678. But trRosetta is also for gpu. Here the new github for TrRosetta, inside RosettaCommons ID: 105285 · Rating: 0 · rate: / Reply Quote