Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107531 - Posted: 21 Oct 2022, 20:55:27 UTC
Last modified: 21 Oct 2022, 20:56:43 UTC




So I am showing you the 1080 startup and then the 1050 running
So as you see the copy box is active on both.

Again...refer to the text in the previous post.
ID: 107531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 107532 - Posted: 21 Oct 2022, 20:57:42 UTC - in response to Message 107531.  

What other graphs does it support?
ID: 107532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107533 - Posted: 21 Oct 2022, 23:17:23 UTC - in response to Message 107532.  

What other graphs does it support?



What do you mean?
ID: 107533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107534 - Posted: 21 Oct 2022, 23:20:30 UTC
Last modified: 21 Oct 2022, 23:21:21 UTC

There are generally two ways to distribute computation across multiple devices:

Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc.

Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches.

This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training.

Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your code, in the following two setups:

On multiple GPUs (typically 2 to 8) installed on a single machine (single host, multi-device training). This is the most common setup for researchers and small-scale industry workflows.
On a cluster of many machines, each hosting one or multiple GPUs (multi-worker distributed training). This is a good setup for large-scale industry workflows, e.g. training high-resolution image classification models on tens of millions of images using 20-100 GPUs.

More at: https://keras.io/guides/distributed_training/
ID: 107534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 107535 - Posted: 21 Oct 2022, 23:47:50 UTC - in response to Message 107534.  
Last modified: 21 Oct 2022, 23:48:31 UTC

There should be a graph called cuda. For some reason task manager doesn't count it at processes tab.
ID: 107535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 107536 - Posted: 22 Oct 2022, 0:25:47 UTC - in response to Message 107534.  

[snip]

There are generally two ways to distribute computation across multiple devices:

Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc.

Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches.

This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training.

NVIDIA GPUs work best with models that use few branches.

The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.

Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them.

I've found little information of whether this also happens in AMD GPUs.

Dividing the work between two or more GPUs should work if the portion of the work on one GPU does not need to exchange more than a rather small amount of information with any other GPU, provided that the application can divide the work properly which means that the division must be written into the application rather than expecting it to happen automatically.

Multithreaded CPU applications can use multiple CPU cores at once if the application is written to allow this. The main restriction on these is that no two virtual cores within a physical core can execute an instruction at the same time. However, main memory speed is usually such that any virtual core waiting on a main memory access will not have to wait any longer if another virtual core for that physical core can get its inputs from a cache instead of main memory.
ID: 107536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107537 - Posted: 22 Oct 2022, 2:22:46 UTC - in response to Message 107536.  

The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.

Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them.

I've found little information of whether this also happens in AMD GPUs.
Are warps equivalent to wavefronts?
https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505
ID: 107537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1233
Credit: 14,338,560
RAC: 2,014
Message 107538 - Posted: 22 Oct 2022, 2:35:10 UTC - in response to Message 107537.  

I've found little information of whether this also happens in AMD GPUs.
Are warps equivalent to wavefronts?
https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505

I read that, and found it confusing about whether they are or not.
ID: 107538 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107539 - Posted: 22 Oct 2022, 8:27:22 UTC

So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?
ID: 107539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107541 - Posted: 22 Oct 2022, 9:11:34 UTC - in response to Message 107539.  

So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?
Sounds like it. Might aswell use both for one task if they can, then the task is completed in half the time.

I'm wondering about the efficiency though. I get one done in 11 minutes on a single Radeon Fury (8Tflops SP), and it uses NO cpu at all. You're taking 9 minutes (so 1.2 times faster) on 2 cards totalling 1.5 times the power, and using two CPU cores aswell.

Maybe NVidia are just rubbish. It could be their total lack of DP support, so yours is having to use the CPU for those bits.
ID: 107541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 107542 - Posted: 22 Oct 2022, 13:34:07 UTC - in response to Message 107541.  

It could be their total lack of DP support, so yours is having to use the CPU for those bits.

All Geforce 10X0 GPUs have DP and Moo! doesn't need it at all. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.

BTW, use GPU-Z to check GPU usage, not task manager, that's obviously useless for that, the GPU should be near 100% when Moo! is running.
.
ID: 107542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107543 - Posted: 22 Oct 2022, 13:53:10 UTC - in response to Message 107542.  

All Geforce 10X0 GPUs have DP
At 1:32, which is laughable. My AMDs are 1:4.

and Moo! doesn't need it at all.
Sure? Most projects need some DP. MW is entirely DP, but other projects need it sometimes. At 1:32 that's gonna slow things down immensely.

But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.
They're not the best designed tasks. I have a dual GPU card and it doesn't notice the second chip, yet every other project does. Mind you I've heard of Nvidia needing high CPU usage on a number of projects, I can't remember what the cause is.

Why do you say "or distributed.net? Are they not one and the same?
ID: 107543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 107544 - Posted: 22 Oct 2022, 15:43:10 UTC - in response to Message 107543.  

All Geforce 10X0 GPUs have DP
At 1:32, which is laughable. My AMDs are 1:4.

IIRC only Tahiti-Chips and the Radeon VII have 1:4, other are worse, down to at least 1:16, but that doesn't matter for CPU usage, if the GPU has it and the app suports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about.


and Moo! doesn't need it at all.
Sure? Most projects need some DP. MW is entirely DP, but other projects need it sometimes.

Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein.


At 1:32 that's gonna slow things down immensely.

Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs.


But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.
They're not the best designed tasks. I have a dual GPU card and it doesn't notice the second chip, yet every other project does. Mind you I've heard of Nvidia needing high CPU usage on a number of projects, I can't remember what the cause is.

Don't remember the cause exactly either, on Milkyway it seems to be acceptable for my card at least, but I've seen there other Nvidias using 100% of a CPU core as well, so YMMV.


Why do you say "or distributed.net? Are they not one and the same?

No, they are separate projects (even if Moo tasks come from distributed.net), Moo! is responsible for the wrapper application, distributed.net for the actual client (and that was using all the CPU time, not the wrapper).
.
ID: 107544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107545 - Posted: 22 Oct 2022, 15:52:11 UTC - in response to Message 107544.  

IIRC only Tahiti-Chips
Which is why I go for those.

and the Radeon VII have 1:4, other are worse, down to at least 1:16
But always 2 times better than Nvidia.

but that doesn't matter for CPU usage, if the GPU has it and the app supports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about.
Wrong, if it needs 1/4 of its stuff done in DP, and the card can't do 1:4, then some will have to go to the CPU.

Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein.
There are many of them, Folding, Primegrid, World Community Grid for example.

Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs.
Not if you have a Ryzen 9 CPU, those are pretty fast even compared to GPUs. Provided you can multithread. For some reason I don't see many multithread CPU + GPU tasks. In fact I was surprised at Greg's 2CPU+2NV Moo task. Although it probably just split it's workload in two and did 1 CPU + 1 NV for each half.

Don't remember the cause exactly either, on Milkyway it seems to be acceptable for my card at least, but I've seen there other Nvidias using 100% of a CPU core as well, so YMMV.
Odd, since MW is one of the least CPU intensive GPU tasks.

No, they are separate projects (even if Moo tasks come from distributed.net), Moo! is responsible for the wrapper application, distributed.net for the actual client (and that was using all the CPU time, not the wrapper).
So Moo sometimes does stuff from places other than distributed net? And distributed net is a non-Boinc program you can run seperately?
ID: 107545 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107546 - Posted: 22 Oct 2022, 16:07:16 UTC
Last modified: 22 Oct 2022, 16:08:49 UTC

But MOO is sharing the cards with FAH on my system.
So maybe that downgrades the time a bit?

And I don't have the fastest cards.
A 1080 plain and a 1050TI
ID: 107546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107547 - Posted: 22 Oct 2022, 16:39:12 UTC - in response to Message 107546.  

But MOO is sharing the cards with FAH on my system.
So maybe that downgrades the time a bit?

And I don't have the fastest cards.
A 1080 plain and a 1050TI
If the GPUs are also doing Folding, then Moo is using a CPU core for less than a GPU. Anyway, you're doing almost as much Moo as I expect from those cards, I think it was 1.2 times as much as me and I thought it should be 1.5, so all is well.

Rule 1: If heat is pouring off the chip, it's doing a lot of work.
ID: 107547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 107548 - Posted: 22 Oct 2022, 17:19:09 UTC - in response to Message 107545.  

but that doesn't matter for CPU usage, if the GPU has it and the app supports and needs it, it will use it regardless of the ratio, which the app even doesn't know anything about.
Wrong, if it needs 1/4 of its stuff done in DP, and the card can't do 1:4, then some will have to go to the CPU.

The code of the app needs to support it like in case of Einstein, GPU and CPU is not one thing where the code gets executed and can choose by itself where if feels it might run faster or do "some" stuff on the CPU if the DP performance is low on the GPU, the detection of SP/DP on GPU and the fallback to CPU for DP calculations must be implemented, otherwise a DP app will just error out on SP card.


Sure, since it runs on SP cards and doesn't have additional CPU load there, IIRC it's doing integer only like Collatz did. The only project I know that needs DP "sometimes" and is able to run on SP cards and doing the DP part on CPU is Einstein.
There are many of them, Folding, Primegrid, World Community Grid for example.

OPNG shouldn't need it IIRC, no idea about the others, but unlike they implemented it like Einstein, if they need DP, the app will not run on SP cards, just like Milkyway.


Still several times faster than doing it on CPU, hence Einstein ist doing the DP part even on 1:64 GPUs.
Not if you have a Ryzen 9 CPU, those are pretty fast even compared to GPUs. Provided you can multithread. For some reason I don't see many multithread CPU + GPU tasks.

It will still run on GPU, the app doesn't know if the CPU is faster: https://einsteinathome.org/content/fgrp5-cpu-and-fgrpb1g-gpu-why-does-crunching-seem-pause-90


In fact I was surprised at Greg's 2CPU+2NV Moo task. Although it probably just split it's workload in two and did 1 CPU + 1 NV for each half.

Never seen that either.


So Moo sometimes does stuff from places other than distributed net?

No, but like yoyo, which does as one of their projects the OGR project from distributed.net, they could if the admin would want to.


And distributed net is a non-Boinc program you can run seperately?

Yes. https://www.distributed.net/Download_clients
.
ID: 107548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107549 - Posted: 22 Oct 2022, 17:31:23 UTC - in response to Message 107547.  

Rule 1: If heat is pouring off the chip, it's doing a lot of work.- The fans get their workout everyday. At the moment they are just doing the easy stuff with FAH. Apparently TN needs everything cpu, thought I suppose I can write a script to put that down to 14 or something.
ID: 107549 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107550 - Posted: 22 Oct 2022, 17:38:49 UTC - in response to Message 107548.  

The code of the app needs to support it like in case of Einstein, GPU and CPU is not one thing where the code gets executed and can choose by itself where if feels it might run faster or do "some" stuff on the CPU if the DP performance is low on the GPU, the detection of SP/DP on GPU and the fallback to CPU for DP calculations must be implemented, otherwise a DP app will just error out on SP card.
Isn't it possible to write the program so it does as much DP as possible on the GPU, but if that's not enough, use the CPU aswell?

OPNG shouldn't need it IIRC, no idea about the others, but unlike they implemented it like Einstein, if they need DP, the app will not run on SP cards, just like Milkyway.
I'm sure I saw someone mention there's some DP in it. I would assume a decent program would use the CPU for that bit if the card was SP only (those actually exist? I thought all cards had at least a tiny bit of DP). Anyway, judging by the speed OPNG runs on my different cards, there's DP in it.

It will still run on GPU, the app doesn't know if the CPU is faster: https://einsteinathome.org/content/fgrp5-cpu-and-fgrpb1g-gpu-why-does-crunching-seem-pause-90
Surely the app can get the benchmark data from Boinc?

Never seen that either.
Have you tried Moo on two Nvidia cards?

Yes. https://www.distributed.net/Download_clients
ROFL at "Trojan horses and other perverted versions have been known to have been circulated" Perverted maths programs? Ooooh sexy numbers!
ID: 107550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 107551 - Posted: 22 Oct 2022, 17:42:06 UTC - in response to Message 107549.  
Last modified: 22 Oct 2022, 17:42:26 UTC

Rule 1: If heat is pouring off the chip, it's doing a lot of work.- The fans get their workout everyday. At the moment they are just doing the easy stuff with FAH. Apparently TN needs everything cpu, thought I suppose I can write a script to put that down to 14 or something.
Some maths projects seem to use more electricity (and therefore produce more heat) on a GPU. My Fury card seems to have been badly designed. At normal settings, it drops the clock dramatically on some projects. I found this was due to it hitting the power limit of the VRMs. So I cranked the power limit up to 150% in Afterburner, which worked for about 3 weeks, then the power connector melted, deforming the plastic shroud and oxidising the contacts. So I soldered the wires on directly :-)
ID: 107551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org