Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 309 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
So I am showing you the 1080 startup and then the 1050 running So as you see the copy box is active on both. Again...refer to the text in the previous post. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 407 |
What other graphs does it support? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
What other graphs does it support? What do you mean? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
There are generally two ways to distribute computation across multiple devices: Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc. Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches. This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training. Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your code, in the following two setups: On multiple GPUs (typically 2 to 8) installed on a single machine (single host, multi-device training). This is the most common setup for researchers and small-scale industry workflows. On a cluster of many machines, each hosting one or multiple GPUs (multi-worker distributed training). This is a good setup for large-scale industry workflows, e.g. training high-resolution image classification models on tens of millions of images using 20-100 GPUs. More at: https://keras.io/guides/distributed_training/ |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 407 |
There should be a graph called cuda. For some reason task manager doesn't count it at processes tab. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,456 |
[snip] There are generally two ways to distribute computation across multiple devices: NVIDIA GPUs work best with models that use few branches. The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp. Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them. I've found little information of whether this also happens in AMD GPUs. Dividing the work between two or more GPUs should work if the portion of the work on one GPU does not need to exchange more than a rather small amount of information with any other GPU, provided that the application can divide the work properly which means that the division must be written into the application rather than expecting it to happen automatically. Multithreaded CPU applications can use multiple CPU cores at once if the application is written to allow this. The main restriction on these is that no two virtual cores within a physical core can execute an instruction at the same time. However, main memory speed is usually such that any virtual core waiting on a main memory access will not have to wait any longer if another virtual core for that physical core can get its inputs from a cache instead of main memory. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.Are warps equivalent to wavefronts? https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,456 |
I've found little information of whether this also happens in AMD GPUs.Are warps equivalent to wavefronts? I read that, and found it confusing about whether they are or not. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?Sounds like it. Might aswell use both for one task if they can, then the task is completed in half the time. I'm wondering about the efficiency though. I get one done in 11 minutes on a single Radeon Fury (8Tflops SP), and it uses NO cpu at all. You're taking 9 minutes (so 1.2 times faster) on 2 cards totalling 1.5 times the power, and using two CPU cores aswell. Maybe NVidia are just rubbish. It could be their total lack of DP support, so yours is having to use the CPU for those bits. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
It could be their total lack of DP support, so yours is having to use the CPU for those bits. All Geforce 10X0 GPUs have DP and Moo! doesn't need it at all. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing. BTW, use GPU-Z to check GPU usage, not task manager, that's obviously useless for that, the GPU should be near 100% when Moo! is running. . |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
All Geforce 10X0 GPUs have DPAt 1:32, which is laughable. My AMDs are 1:4. and Moo! doesn't need it at all.Sure? Most projects need some DP. MW is entirely DP, but other projects need it sometimes. At 1:32 that's gonna slow things down immensely. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.They're not the best designed tasks. I have a dual GPU card and it doesn't notice the second chip, yet every other project does. Mind you I've heard of Nvidia needing high CPU usage on a number of projects, I can't remember what the cause is. Why do you say "or distributed.net? Are they not one and the same? |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
All Geforce 10X0 GPUs have DPAt 1:32, which is laughable. My AMDs are 1:4. IIRC only Tahiti-Chips and the Radeon VII have 1:4, other are worse, down to at least 1:16, but that doesn't matter for CPU usage, if the GPU has it and the app suports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about. and Moo! doesn't need it at all.Sure? Most projects need some DP. MW is entirely DP, but other projects need it sometimes. Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein. At 1:32 that's gonna slow things down immensely. Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.They're not the best designed tasks. I have a dual GPU card and it doesn't notice the second chip, yet every other project does. Mind you I've heard of Nvidia needing high CPU usage on a number of projects, I can't remember what the cause is. Don't remember the cause exactly either, on Milkyway it seems to be acceptable for my card at least, but I've seen there other Nvidias using 100% of a CPU core as well, so YMMV. Why do you say "or distributed.net? Are they not one and the same? No, they are separate projects (even if Moo tasks come from distributed.net), Moo! is responsible for the wrapper application, distributed.net for the actual client (and that was using all the CPU time, not the wrapper). . |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
IIRC only Tahiti-ChipsWhich is why I go for those. and the Radeon VII have 1:4, other are worse, down to at least 1:16But always 2 times better than Nvidia. but that doesn't matter for CPU usage, if the GPU has it and the app supports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about.Wrong, if it needs 1/4 of its stuff done in DP, and the card can't do 1:4, then some will have to go to the CPU. Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein.There are many of them, Folding, Primegrid, World Community Grid for example. Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs.Not if you have a Ryzen 9 CPU, those are pretty fast even compared to GPUs. Provided you can multithread. For some reason I don't see many multithread CPU + GPU tasks. In fact I was surprised at Greg's 2CPU+2NV Moo task. Although it probably just split it's workload in two and did 1 CPU + 1 NV for each half. Don't remember the cause exactly either, on Milkyway it seems to be acceptable for my card at least, but I've seen there other Nvidias using 100% of a CPU core as well, so YMMV.Odd, since MW is one of the least CPU intensive GPU tasks. No, they are separate projects (even if Moo tasks come from distributed.net), Moo! is responsible for the wrapper application, distributed.net for the actual client (and that was using all the CPU time, not the wrapper).So Moo sometimes does stuff from places other than distributed net? And distributed net is a non-Boinc program you can run seperately? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
But MOO is sharing the cards with FAH on my system. So maybe that downgrades the time a bit? And I don't have the fastest cards. A 1080 plain and a 1050TI |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
But MOO is sharing the cards with FAH on my system.If the GPUs are also doing Folding, then Moo is using a CPU core for less than a GPU. Anyway, you're doing almost as much Moo as I expect from those cards, I think it was 1.2 times as much as me and I thought it should be 1.5, so all is well. Rule 1: If heat is pouring off the chip, it's doing a lot of work. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
but that doesn't matter for CPU usage, if the GPU has it and the app supports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about.Wrong, if it needs 1/4 of its stuff done in DP, and the card can't do 1:4, then some will have to go to the CPU. The code of the app needs to support it like in case of Einstein, GPU and CPU is not one thing where the code gets executed and can choose by itself where if feels it might run faster or do "some" stuff on the CPU if the DP performance is low on the GPU, the detection of SP/DP on GPU and the fallback to CPU for DP calculations must be implemented, otherwise a DP app will just error out on SP card. Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein.There are many of them, Folding, Primegrid, World Community Grid for example. OPNG shouldn't need it IIRC, no idea about the others, but unlike they implemented it like Einstein, if they need DP, the app will not run on SP cards, just like Milkyway. Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs.Not if you have a Ryzen 9 CPU, those are pretty fast even compared to GPUs. Provided you can multithread. For some reason I don't see many multithread CPU + GPU tasks. It will still run on GPU, the app doesn't know if the CPU is faster: https://einsteinathome.org/content/fgrp5-cpu-and-fgrpb1g-gpu-why-does-crunching-seem-pause-90 In fact I was surprised at Greg's 2CPU+2NV Moo task. Although it probably just split it's workload in two and did 1 CPU + 1 NV for each half. Never seen that either. So Moo sometimes does stuff from places other than distributed net? No, but like yoyo, which does as one of their projects the OGR project from distributed.net, they could if the admin would want to. And distributed net is a non-Boinc program you can run seperately? Yes. https://www.distributed.net/Download_clients . |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Rule 1: If heat is pouring off the chip, it's doing a lot of work.- The fans get their workout everyday. At the moment they are just doing the easy stuff with FAH. Apparently TN needs everything cpu, thought I suppose I can write a script to put that down to 14 or something. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
The code of the app needs to support it like in case of Einstein, GPU and CPU is not one thing where the code gets executed and can choose by itself where if feels it might run faster or do "some" stuff on the CPU if the DP performance is low on the GPU, the detection of SP/DP on GPU and the fallback to CPU for DP calculations must be implemented, otherwise a DP app will just error out on SP card.Isn't it possible to write the program so it does as much DP as possible on the GPU, but if that's not enough, use the CPU aswell? OPNG shouldn't need it IIRC, no idea about the others, but unlike they implemented it like Einstein, if they need DP, the app will not run on SP cards, just like Milkyway.I'm sure I saw someone mention there's some DP in it. I would assume a decent program would use the CPU for that bit if the card was SP only (those actually exist? I thought all cards had at least a tiny bit of DP). Anyway, judging by the speed OPNG runs on my different cards, there's DP in it. It will still run on GPU, the app doesn't know if the CPU is faster: https://einsteinathome.org/content/fgrp5-cpu-and-fgrpb1g-gpu-why-does-crunching-seem-pause-90Surely the app can get the benchmark data from Boinc? Never seen that either.Have you tried Moo on two Nvidia cards? Yes. https://www.distributed.net/Download_clientsROFL at "Trojan horses and other perverted versions have been known to have been circulated" Perverted maths programs? Ooooh sexy numbers! |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 12,028 |
Rule 1: If heat is pouring off the chip, it's doing a lot of work.- The fans get their workout everyday. At the moment they are just doing the easy stuff with FAH. Apparently TN needs everything cpu, thought I suppose I can write a script to put that down to 14 or something.Some maths projects seem to use more electricity (and therefore produce more heat) on a GPU. My Fury card seems to have been badly designed. At normal settings, it drops the clock dramatically on some projects. I found this was due to it hitting the power limit of the VRMs. So I cranked the power limit up to 150% in Afterburner, which worked for about 3 weeks, then the power connector melted, deforming the plastic shroud and oxidising the contacts. So I soldered the wires on directly :-) |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org