The most efficient cruncher rig possible

Message boards : Number crunching : The most efficient cruncher rig possible

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,712,330
RAC: 21,877
Message 95440 - Posted: 27 Apr 2020, 19:08:55 UTC - in response to Message 95435.  




AMD Ryzen 3990x will do 115k-120k ppd RAC. That's drawing 280 watt total for cpu, although the ppd doesn't reduce much throttling at 200W. So if you could get a pi to run to 900ppd you would need say 133 of them to equal a 3990x.

The pi's would cost by your figures, I'll take the mid price at 88.5*133 =$11770

They would draw 480 watts in total.

This is just for comparison of course, but the 3990x is cheaper, more efficient, plus you have an awesome pc.

I’m actually guessing that 2x Ryzen 3950s would be both a better points per day platform as well as having a better return in points per dollar than compared to a Threadripper 3990x. In both cases , though, the resale value is going to drop considerably more than raspberry pis or multiple 3700x.
ID: 95440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,504,666
RAC: 57,087
Message 95451 - Posted: 27 Apr 2020, 23:13:15 UTC

The threadrippers are awesome, but I think some of these would be a lot more productive for the money:

https://www.ebay.co.uk/itm/143586312188

https://www.ebay.co.uk/itm/143494592135

I think Sandy Bridge servers became a lot cheaper after meltdown and spectre came out because they didn't get patched so loads of data centres had to turn off hyperthreading or get rid of them.

D
ID: 95451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95478 - Posted: 28 Apr 2020, 14:18:16 UTC
Last modified: 28 Apr 2020, 14:47:57 UTC

i did an experiment today, i ran r@h on my Intel haswell desktop (i7 4790 non K, 'old' by today's standards) while concurrently have r@h running on Pi4
i throttled down my desktop to 3.5 ghz to temper down the fan speeds

it turns out both my desktop and Pi received similar jobs
https://boinc.bakerlab.org/rosetta/result.php?resultid=1162955493
https://boinc.bakerlab.org/rosetta/result.php?resultid=1162850029

the Intel haswell runs 8 concurrent threads for 4 hours (3.5ghz) while Pi4 has been running all day (overclocked slightly to 1.75ghz).
Pi4 did varying number of concurrent r@h threads (up to 4) as apparently it is either waiting for memory (more common) or that i ran out of disk space

it turns out the Intel haswell earned credits in that 4 hours about 1000+ credits about the same as Pi4 takes to achieve after a whole day 24 hrs crunching.

while my Intel haswell desktop is running r@h, i ran, power gadget, i got a reading about 75 watts running 8 threads of r@h for 4 hours
https://software.intel.com/en-us/articles/intel-power-gadget
adding another 25 watts of overheads, that would make it 100 W for that 4 hours
so that 8 threads took 400 watts . hours

as i don't have a means to measure Pi4 power consumption for now, i make some estimates based on some power consumption statistics posted by @Endgame124 (thanks)
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13732&postid=95458
as I overclocked to 1.75 ghz and over voltaged to 0.9125 volts so an guessitimate of Pi4 power consumption may be say 40 watts
so 40 watts into 24 hours ~ 960 watts . hours

assuming for the same number of credits, the desktop haswell is apparently twice as efficient as Pi4 in terms of points (credits) per watt even though i throttled down my haswell to 3.5ghz
that result is rather interesting and it seemed echo findings of others here, the processors better than this haswell probably deliver even better points per watt.
it kind of show that lower power consumption socs / cpus do not necessarily mean higher performance efficiency

i've also added in cc_config on Pi4
<ncpus>3</ncpus>

this limits the number of jobs run concurrently to 3 tasks at any one time. hopefully that reduces memory contention and the 'waiting for memory' occurrences.
it doesn't help to be 'waiting for memory' and hogging time on a task when if i don't fetch that task, someone else could have processed it at higher performance and turned that around sooner
oh and on pi4, i think overclocking it slightly gives more bang (points) per watt, i managed just about above 1000+ points at 1.75ghz, but i've got too little data, this is just based on a lucky day, i.e. the 1st day it ran round the clock. but have a good heat sink + fan to keep temperatures sensible on Pi4
ID: 95478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,712,330
RAC: 21,877
Message 95489 - Posted: 28 Apr 2020, 17:19:51 UTC - in response to Message 95478.  

i did an experiment today...
Snip
...
as i don't have a means to measure Pi4 power consumption for now, i make some estimates based on some power consumption statistics posted by @Endgame124 (thanks)
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13732&postid=95458
as I overclocked to 1.75 ghz and over voltaged to 0.9125 volts so an guessitimate of Pi4 power consumption may be say 40 watts
so 40 watts into 24 hours ~ 960 watts . hours

assuming for the same number of credits, the desktop haswell is apparently twice as efficient as Pi4 in terms of points (credits) per watt even though i throttled down my haswell to 3.5ghz
that result is rather interesting and it seemed echo findings of others here, the processors better than this haswell probably deliver even better points per watt.
it kind of show that lower power consumption socs / cpus do not necessarily mean higher performance efficiency



Note, power reduced pi 4 is using around 3.5 watts. Maybe I didn’t follow something, but a pi 4 running for 24 hours should use around 40 to 48 total watt hours, not 960
ID: 95489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hnapel

Send message
Joined: 8 Apr 20
Posts: 8
Credit: 835,346
RAC: 0
Message 95494 - Posted: 28 Apr 2020, 18:14:43 UTC - in response to Message 94381.  

This got me thinking, the discussion about processing lots of work is right to involve what amount can be done given the price of the hardware and the power required to operate it. The information on the ADM site is hilarious in that respect: "Exceptional Processors Deserve Exceptional Cooling". Well if their processors would be so exceptional, maybe they should require less cooling! The discussion in general is also shifted from sheer performance to how may FLOPS can I churn out for any given amount of Wattage, the high end PCs (apart from already being overpriced) are likely not on the optimum. There are developments in opto-electronics that will hopefully shift the balance in the near term future. A guy named Dyson who recently died had the vision that highly civilized extraterrestrial beings would try to harvest all the energy from their star with a gigantic encompassing structure, this was referred to as a Dyson Sphere, for all his ingenuity, Dyson did not theorize that if those beings were so smart as to get to that state, they would also maybe have invented some more energy efficient processors.
ID: 95494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95518 - Posted: 29 Apr 2020, 1:46:16 UTC - in response to Message 95494.  
Last modified: 29 Apr 2020, 1:52:44 UTC

This got me thinking, the discussion about processing lots of work is right to involve what amount can be done given the price of the hardware and the power required to operate it. The information on the ADM site is hilarious in that respect: "Exceptional Processors Deserve Exceptional Cooling". Well if their processors would be so exceptional, maybe they should require less cooling! The discussion in general is also shifted from sheer performance to how may FLOPS can I churn out for any given amount of Wattage, the high end PCs (apart from already being overpriced) are likely not on the optimum. There are developments in opto-electronics that will hopefully shift the balance in the near term future. A guy named Dyson who recently died had the vision that highly civilized extraterrestrial beings would try to harvest all the energy from their star with a gigantic encompassing structure, this was referred to as a Dyson Sphere, for all his ingenuity, Dyson did not theorize that if those beings were so smart as to get to that state, they would also maybe have invented some more energy efficient processors.


that has got to do wiith the end of moore's law and end of dennard scaling
https://cs.stackexchange.com/questions/27875/moores-law-and-clock-speed


so today's high core count extreme performance (extreme core counts) processors runs hotter than ever consuming more watts as more transistors are added.
it is reaching a point it is extremely difficult to keep high core counts cooler, the heat density is ever increasing and there is no reasonable way to remove all that heat.
it'd seem one day we'd need to resort to very expensive cryogenic cooling just to keep those processors cool

or rather, most processors these days state absolute power limits i.e. the TDP, you can't exceed that without reasonable means of cooling and going beyond TDP with extreme processors generates more heat (power consumption) than ever, overclocking them gets very expensive due to the cooling requirements
ID: 95518 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95520 - Posted: 29 Apr 2020, 2:26:57 UTC - in response to Message 95489.  
Last modified: 29 Apr 2020, 2:37:34 UTC

"Endgame124" wrote:

Note, power reduced pi 4 is using around 3.5 watts. Maybe I didn’t follow something, but a pi 4 running for 24 hours should use around 40 to 48 total watt hours, not 960

Thanks Endgame124

hi all,

please note this, the power consumption of Pi4 in my earlier post https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13791&postid=95478 may be grossly overstated

if this is true, Pi4 consume 4 watts that would only be 4 watts . 24 hours ~ 96 watt . hours in 24 hours and it gets about 1000+ points (overclocked 1.75ghz) in that time frame. it would actually make a Pi4 very efficient in terms of points per watt it becomes 4 times more efficient than say my haswell desktop takes 400 watt . hours to get the same amount of points
ID: 95520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,712,330
RAC: 21,877
Message 95522 - Posted: 29 Apr 2020, 2:50:43 UTC - in response to Message 95520.  

"Endgame124" wrote:

Note, power reduced pi 4 is using around 3.5 watts. Maybe I didn’t follow something, but a pi 4 running for 24 hours should use around 40 to 48 total watt hours, not 960

Thanks Endgame124

hi all,

please note this, the power consumption of Pi4 in my earlier post https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13791&postid=95478 may be grossly overstated

if this is true, Pi4 consume 4 watts that would only be 4 watts . 24 hours ~ 96 watt . hours in 24 hours and it gets about 1000+ points (overclocked 1.75ghz) in that time frame. it would actually make a Pi4 very efficient in terms of points per watt it becomes 4 times more efficient than say my haswell desktop takes 400 watt . hours to get the same amount of points


I see where the confusion came from. I listed the total power draw on my ups, and left it to the reader to calculate the power draw of the pi. If you don’t read closely, it looks like the pi 4’s power draw is much higher. I can’t edit the post, so I can’t fix it to make it more clear. I’ll try to collect everything into a summary post later.
ID: 95522 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95578 - Posted: 30 Apr 2020, 4:59:13 UTC
Last modified: 30 Apr 2020, 5:00:56 UTC

in another lucky day my Pi4 did 1785 points in 24 hours running 3 concurrent threads overclocked to 1.75Ghz
so assuming it is 4 watts . 24 hours ~ 96 watts. hours
this results isn't too bad really
ID: 95578 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95582 - Posted: 30 Apr 2020, 6:49:06 UTC - in response to Message 95578.  
Last modified: 30 Apr 2020, 6:50:03 UTC

Thanks for sharing.
See here how Recent Average Credit is computed:


You basically need to run your rigs for weeks to be comparable as you've rightly concluded that you can get batches of "lucky" (uncalibrated) work every once in a while.

For a more fair comparison, I've outlined above a protocol that could work better. TL;DR:
- Grab the WU command line from `ps -e f` (I think it is also included in a file)
- Stop BOINC
- Copy away the slot directory of a running WU
- Run the given command line for each core (preferably from separate directories)
- Check how many decoys it produces after 8 hours (sleep && kill) and/or plot the decoy production progress on a graph as per the state/log (may need pipe through a timestamping tool or polling)
- The executables are not compatible between a Pi and a PC, but the data files and parameters should be (TODO).

This could already give a hint, but to be even more fair, you would need to repeat this for each representative kind of WU (I think there are less than a dozen of them).

I'm not at all surprised that you find a legacy desktop computer much less efficient, this can be seen from the earlier numbers, but the right kind of legacy laptop is much more competitive with an rPi4, as they usually containing high efficiency ultra low power CPUs.

ID: 95582 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95583 - Posted: 30 Apr 2020, 6:54:14 UTC - in response to Message 95478.  
Last modified: 30 Apr 2020, 6:55:18 UTC

If you would use the fastest zram compression possible (LZ4?), you may get 6GB out of a 4GB rPi4.

I've done a dirty patch myself on BOINC on a low-memory PC so it would schedule as much work as the zram'med memory amount would allow, instead of limiting to the physical amount. This solves the "waiting for memory" messages. I had to patch it because it caps memory allowance to be under 100% when requesting jobs, although, I guess after you have the jobs the scheduler may allow > 100% settings (TODO).

It has been working pretty good for weeks now.
ID: 95583 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raistmer

Send message
Joined: 7 Apr 20
Posts: 49
Credit: 794,064
RAC: 0
Message 95595 - Posted: 30 Apr 2020, 11:13:38 UTC - in response to Message 95583.  

If you would use the fastest zram compression possible (LZ4?), you may get 6GB out of a 4GB rPi4.

I've done a dirty patch myself on BOINC on a low-memory PC so it would schedule as much work as the zram'med memory amount would allow, instead of limiting to the physical amount. This solves the "waiting for memory" messages. I had to patch it because it caps memory allowance to be under 100% when requesting jobs, although, I guess after you have the jobs the scheduler may allow > 100% settings (TODO).

It has been working pretty good for weeks now.

Could you estimate how big share of compression in processing power? How many cycles goes to maintain compressed data versus to retrieve it directly from memory?
ID: 95595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95605 - Posted: 30 Apr 2020, 13:46:20 UTC - in response to Message 95595.  
Last modified: 30 Apr 2020, 13:48:36 UTC

It is quite common, most Android phones also had it set up under the hood for many years now.

Not sure how I could answer in the most correct way, but let me share some metrics if it helps. I'm simulating 3GB RAM on a 2GB node for running on 2 cores using 2GB zram compressed with deflate and a 2.5" HDD for backup.
    - uptime = 2454001s
    - kswapd = 137:48
    - ksoftirqd = 2:59 + 0:51
    - sum of all kernel threads = 9082s
    - 0.37%.



I'm keeping minute resolution stats of thermals, frequencies, load and power consumption (where available). The average I/O wait in the last 20 days was 0.3%. The median should be around 0%.

In some minutes it went up to 5-10%, but I can't be sure that it was because of BOINC, because it's also running DVR capture, regular media processing and an actively used file server in the background as well.

So the concept of this is that only infrequently used applications are swapped out and portions of heap that aren't really actively used, so swapping shouldn't really occur that much. Under these constraints, you performance hit should be minor.

ID: 95605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Raistmer

Send message
Joined: 7 Apr 20
Posts: 49
Credit: 794,064
RAC: 0
Message 95623 - Posted: 30 Apr 2020, 17:04:46 UTC - in response to Message 95605.  
Last modified: 30 Apr 2020, 17:05:02 UTC

Thanks for comments.
Do you aware of smth similar for Windows world (I know that Win10 includes Superfetch, but besides of that, maybe some third party apps/drivers/services) ?
ID: 95623 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95645 - Posted: 1 May 2020, 5:39:40 UTC
Last modified: 1 May 2020, 5:53:59 UTC

i pounced on this post on raspberry pi forums
https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=271456&p=1652922#p1652682

ejolson wrote:
While Endgame seems to be talking about credit/watt-day rather than watt-hour, even after conversion the numbers are still all over the place.

               credit/W-hr
Pi4B 1750Mhz      18.6
Ryzen 3990X       17.7
Pi4B Stock        10.1
Haswell i?         3.5

While your over-clocked Pi 4B looks very power efficient, you also mention that 1785 points was the peak for a day rather than an average taken over many days. Are you regularly getting around 1785 points per day?


me wrote:
well no, that 1785 points is a prize from a jackpot on a day i got out of a curiosity to try running rosetta@home on Pi4 lol
over time the points per day are likely to go lower
ID: 95645 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95650 - Posted: 1 May 2020, 7:22:38 UTC
Last modified: 1 May 2020, 7:58:11 UTC

the 2nd day is less lucky Pi4 running 24 hours: 1693 points, so 4 watts . 24 hours ~ 96 watts hours
17.6 credits per watt hours lol

working a little math
https://boinc.bakerlab.org/rosetta/cpu_list.php
                                                         Number of   Avg. cores/ GFLOPS  GFLOPs/
                                                         computers   computer    /core	 computer
BCM2835 [Impl 0x41 Arch 8 Variant 0x0 Part 0xd08 Rev 3]  976         4.00        2.08    8.32

this BCM2835 Part 0xd08 Rev 3 is actually a BCM2711 on Pi4

so 8.32 gflops based on RAC ~ 1664.0 credits
https://boinc.berkeley.edu/wiki/Computation_credit

1664 credits / (4 watt . 24 hours) = 1664 credits / 96 watt.hours
= 17.33 credits / watt.hours
quite close
i.e. overclock your Pi4 to rival the top in the world lol
ID: 95650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,712,330
RAC: 21,877
Message 95665 - Posted: 1 May 2020, 12:47:30 UTC
Last modified: 1 May 2020, 12:48:13 UTC

If you want to track my Pis, here are the direct links to the systems:

Stock Pi 4 4GB:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4215281

Pi 4 4GB @2015 mhz:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4269102

Stock Pi 3B+ with Zram and 3 Rosetta Processes
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4244063
ID: 95665 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,504,666
RAC: 57,087
Message 95674 - Posted: 1 May 2020, 18:47:28 UTC

I've got a 4Gb Pi4 running 3 threads on Raspian to add to the data pool too. Everything stock, and passive heatsink/case:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4021327
ID: 95674 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,712,330
RAC: 21,877
Message 95676 - Posted: 1 May 2020, 20:05:52 UTC - in response to Message 95674.  

I've got a 4Gb Pi4 running 3 threads on Raspian to add to the data pool too. Everything stock, and passive heatsink/case:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4021327

Yours has been running longer than mine. Has it had 3 Rosetta processes since you joined it to the project? If so, we may need to re-evaluate the points a 4 process pi 4 can do.
ID: 95676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,504,666
RAC: 57,087
Message 95691 - Posted: 1 May 2020, 22:15:46 UTC - in response to Message 95676.  

Yeah, I set it to 3 threads from the start. Happy to bump it up to 4 to see the impact though.
ID: 95691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : The most efficient cruncher rig possible



©2024 University of Washington
https://www.bakerlab.org