Running on a 4GB Raspberry Pi 4 - How to?

Message boards : Number crunching : Running on a 4GB Raspberry Pi 4 - How to?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 20,367,622
RAC: 372
Message 95575 - Posted: 30 Apr 2020, 0:03:31 UTC

This article is pretty good regarding over clocking and power usage ( see table 2/3 through article)

https://qengineering.eu/overclocking-the-raspberry-pi-4.html
ID: 95575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95585 - Posted: 30 Apr 2020, 7:47:25 UTC

in another lucky day my Pi4 did 1785 points in 24 hours running 3 concurrent threads overclocked to 1.75Ghz
/boot/config.txt
over_voltage=2
arm_freq=1750

/etc/boinc/cc_config.txt
<options>
  <ncpus>3</ncpus>
</options>


my sink fan combo looked like this



running r@h loads for a day temperatures are stable around 42C-45C


dxf template for the acrylic bracket uploaded here
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=271933&p=1652300#p1652300
ID: 95585 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Boulybud

Send message
Joined: 3 Apr 20
Posts: 11
Credit: 351,250
RAC: 0
Message 95586 - Posted: 30 Apr 2020, 8:37:47 UTC - in response to Message 95585.  
Last modified: 30 Apr 2020, 8:38:21 UTC

Still a crash that night at 4:00 AM I do not understand, it worked all day yesterday without a problem, and the Raspberry Pi starts to plant in the night.

I'll try to put


/etc/boinc/cc_config.txt

<options>
<ncpus>3</ncpus>
</options>


To see
ID: 95586 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95600 - Posted: 30 Apr 2020, 12:40:45 UTC - in response to Message 95586.  

If you lose a whole core with this overclock, you would be better off computation and energy efficiency-wise if you simply dialed back the clock a bit.
ID: 95600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95602 - Posted: 30 Apr 2020, 12:50:25 UTC
Last modified: 30 Apr 2020, 13:09:51 UTC

well, in a sense i don't think we'd loose a 'whole' core, the reasons is this, there are various units that are rather large and memory intensive.
so more often i see 'waiting for memory', imho i'd rather not fetch that job rather than 'waiting for memory' hogging time and doing 'nothing'.
this also allows larger wu to be processed successfully.

another thing that is less obvious is the cpu cache lines L1, L2 etc are after all limited, and so is the super scalar pipeline prowess.
if the super scalar pipeline needs data and goes to the caches looking for it, if it can't find it there, the pipeline stall and it needs to hit memory and you start having to count wait states.

so by running one less thread, it improves the chances that data is after all in the cache lines (lower cache contention) and the workload runs at the full overclocked speeds zero wait states giving the maximum gflops whatever that is possible. the other thing is running one less thread means you can achieve a lower TDP, and run cooler. hence now you can bump up your clock speeds. this i'd think is similar to what Intel termed 'turbo boost' lol

another thing is i do have quite substantial swap space allocated, i'd think that may help as i found 'unused' processes tend to get swapped out into swap and simply stay there while the threads run. i think that may be true for the tmpfs mounts which normaly sits in memory

i think there is something about the g-flops as well, if the g-flops is kind of higher, it may squeeze in a whole model done before the deadline, this will make a difference in terms of the points. but i'd guess there'd be some kind of tradeoff in terms of the additional power consumption and higher temperature it generates
ID: 95602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95606 - Posted: 30 Apr 2020, 13:59:55 UTC - in response to Message 95602.  

If you see "waiting for memory", configure zram and overcommit on your memory. That's it! +33% performance for free. The amount of swap doesn't matter, because the BOINC scheduler only considers non-virtual memory, so you have to overcommit it to make use of swap/zram/zcache.

Your theory on caches is somewhat sound, but have you made any measurements? I found a forum topic about caches not mattering as much for R@h as first thought, but I don't think it was conclusive in the end.

Have you also considered that the core consumption may be reduced when it is in a wait state? Some BOINC projects did note this phenomenon, that also increases your performance/watt.

When you bump up the core count, the power increase should be less than linear, because there are various shared components within the CPU. On the other hand, when you are overclocking a CPU, consumption should increase at least linearly with clock and additionally with the square of voltage!

So it should be rarely worth it from a performance/watt perspective, but do share your math if you have _exact_ numbers. We must be extra cautious with our measurements, because dividing such small numbers can introduce big errors, especially if you don't do enough sampling and averaging.

"Squeezing in" more decoys within the "deadline" have absolutely no effect on your generated RAC in the end. If it extrapolates the time of the last model to take longer than the deadline, it simply won't start it and ask another WU.

ID: 95606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kornnugget

Send message
Joined: 12 Apr 20
Posts: 1
Credit: 303,707
RAC: 0
Message 95607 - Posted: 30 Apr 2020, 14:32:03 UTC - in response to Message 93604.  

Here is a good place to start. https://foldforcovid.io/ It will make a custom image for the Pi to use Rosetta@home. Note there there is not always work for the Pi because of limited memory, but if you leave it on it will eventually get work.
ID: 95607 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95609 - Posted: 30 Apr 2020, 14:50:34 UTC
Last modified: 30 Apr 2020, 15:42:39 UTC

thanks, i think i'd explore zram

i'm thinking another way is perhaps to make a custom boinc-client
https://github.com/BOINC/boinc
that would consider 'over commit' some fraction of memory so that swap is utilised
https://boinc.berkeley.edu/trac/wiki/MemoryManagement

i'm not too sure if specifying to use up to 100% of memory in the standard client would after all achieve that.

in terms of the gflops or rather cpu speeds, taking a cue from here
https://qengineering.eu/overclocking-the-raspberry-pi-4.html
currents flow in the instance switching takes place. hence, if by that notion, if a core is 'idle' that core might literally be 'dark' i.e. minimal switching activity, while the other cores that is running the threads are busy switching as the cpu is fully engaged processing instructions in the pipeline.
on this basis, there may be something interesting in the sense that a higher ghz do not translates to higher power into 4 cores. rather 3 cores runs at higher power in terms of both the frequency and voltage, while one core is 'dark'. the limit in this case, is the threat of the over voltage may damage the soc and the combined frequency and over voltage would make the transistors run significantly hotter at the junctions.
but the increased frequency would literally (only) work if it happens that the data required is in the cache (i.e. zero wait) and the compressed clocks would translate to the linear increase in gflops in line with the ghz x 3
https://www.cc.gatech.edu/~hadi/doc/paper/2011-isca-dark_silicon.pdf

i think the dilemma is between running 4 cores at a lower ghz (and possibly somewhat higher cache contention) vs running 3 cores at higher ghz (that .25 ghz improvement would make at most .25 x 3 ~ .75 ghz improvement. And that 3 core scenario would only win out if only by running 3 cores it reduce cache contention so significantly that it is basically running off cache zero waits. quite impossible to achieve
ID: 95609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Boulybud

Send message
Joined: 3 Apr 20
Posts: 11
Credit: 351,250
RAC: 0
Message 95613 - Posted: 30 Apr 2020, 15:35:13 UTC - in response to Message 95609.  
Last modified: 30 Apr 2020, 15:41:24 UTC

news of my installation, it planted again that night ... So I fired Ubuntu to put Raspbian on it and take the opportunity to check the firmware eeprom.

I managed to gain a few °C with Raspbian, after an hour, it turns around 38°C without overcloock, I have more than to wait to see if it is more stable than with Ubuntu.



EDIT : eeprom chip had been updated before Raspbian was installed
ID: 95613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 20,367,622
RAC: 372
Message 95616 - Posted: 30 Apr 2020, 15:44:04 UTC - in response to Message 95606.  


When you bump up the core count, the power increase should be less than linear, because there are various shared components within the CPU. On the other hand, when you are overclocking a CPU, consumption should increase at least linearly with clock and additionally with the square of voltage!

So it should be rarely worth it from a performance/watt perspective, but do share your math if you have _exact_ numbers. We must be extra cautious with our measurements, because dividing such small numbers can introduce big errors, especially if you don't do enough sampling and averaging.


We need to define "worth it" from a performance / watt perspective, because buying dozens of Pi 4s is fairly costly, but it also requires you to manage dozens of systems. The Competition, so to speak, is pretty much a threadripper 3990X (https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3832214), which is generating about 119,000 average credit. At 280W of power (https://www.anandtech.com/show/15483/amd-threadripper-3990x-review/2) , that is 425 Credit / watt without adding in extra system overhead.

If a "stock" pi 4 4gb gets about 850 Average Credit based on results earlier in the thread, and uses about 3.5 Watts, that is only 243 Credit / watt - not only would we need 140 Pi 4s to match the threadripper in average credit, and we would also use more power.

At over_volt 4 and 2015 Mhz, I'm seeing around 4.5W - 5W total consumption (my UPS doesn't measure fractional Watts, so it's very difficult to give an exact measurement here). If we assume a 35% increase in Average Credit (linear based on clockspeed), we would expect 1148 Credit for the Pi 4 (230-255 credit / watt). We still would be less efficient than the threadripper, but we would "only" need 103 Pis to match the threadripper in average credit.

In comparison to a more consumer and budget friendly CPU, the Ryzen 7 3900X (https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3896047) which is getting about 27,500 average credit at likely 140W (https://www.anandtech.com/show/14605/the-and-ryzen-3700x-3900x-review-raising-the-bar/19), or ~197 Credit / Watt, the Pi 4 4GB still comes out ahead in credit per watt even overclocked.

In Summary, if you want the absolutely best performance per Watt, you're better off going for the top end Threadripper. Against most other processors, an Overclocked Pi 4 will be more efficient, and overlocking the Pi 4 allows you spend less money and allows you to manage fewer physical systems.
ID: 95616 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95617 - Posted: 30 Apr 2020, 16:04:21 UTC
Last modified: 30 Apr 2020, 16:04:36 UTC

as for myself, Pi4 vs desktop is more a matter of different 'use cases', as the Pi4 runs pretty much like a server, no monitor and in fact only the USB C that is supplying power.
i can let it run without shutting down frequently. the power efficiency is a delightful finding.
my current desktops can produce more points in less hours, in fact in 4 hours, my desktop haswell does as much points as a non-overclocked Pi4 running for 24 hours.
so that would look like 6 times more efficient in terms of points per hour.
but in terms of power efficiency assuming 4 watts at 1.75 ghz on Pi4, Pi4 produces more points (1785) in 24 hours for about 100 watts hours
while my desktop haswell produces 1400 points running 8 concurrent threads in 4 hours, but it consume 100 watts x 4 hours ~ 400 watts hours
so by the points per watt benchmark Pi4 wins out (probably significantly) based on points per watt if these are indeed the numbers
ID: 95617 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95618 - Posted: 30 Apr 2020, 16:27:25 UTC - in response to Message 95616.  

I still wouldn't declare a winner right now. There are lots of power consumption tweaks that you could apply to your Raspberry Pi 4 that nobody had bothered to measure up until now.

Also you are comparing the system AC power consumption of an rPi with a theoretical TDP (thermal design power) of a CPU itself (that is itself usually less than the maximal power used). If you add the motherboard, memory, fans, storage and then multiply this all with 80-85% AC conversion efficiency of a power supply, the difference wouldn't be that great as you think.

Also, you may optimize your budget if you ditched the SD card and booted over PXE and ran multiple Pi's from a single power supply (an ATX one one should supply a few dozen maybe).

And while we are mostly mentioning Raspberry Pi 4, there exists lots of other modern single board computers (SBC) on the market with 1-4GB RAM that can run 64-bit Linux, some cost less than a Pi, you may want to look these up. You should also not forget that you may be eligible for a discount when purchasing 100 of anything. I myself got into contact with an rPi employee and got a discount of about 40% for a volume purchase.

When you were comparing the overclocked Pi's you've omitted some important factors:
    - Overclocked systems need more expensive cooling
    - Overclocked systems carry a higher maintenance cost (lockups and potentially file system corruption every once in a while as you already see here, memory bit errors, etc)
    - Overclocked systems _may_ see a higher annual failure rate, increasing costs some more



Despite all of the above advantages, I still agree that managing a hundreds hosts requires great skills and is not for everyone, so purchasing a big box is definitely the easier route for most.

ID: 95618 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 20,367,622
RAC: 372
Message 95619 - Posted: 30 Apr 2020, 16:34:09 UTC - in response to Message 95617.  

as for myself, Pi4 vs desktop is more a matter of different 'use cases', as the Pi4 runs pretty much like a server, no monitor and in fact only the USB C that is supplying power.
i can let it run without shutting down frequently. the power efficiency is a delightful finding.
my current desktops can produce more points in less hours, in fact in 4 hours, my desktop haswell does as much points as a non-overclocked Pi4 running for 24 hours.
so that would look like 6 times more efficient in terms of points per hour.
but in terms of power efficiency assuming 4 watts at 1.75 ghz on Pi4, Pi4 produces more points (1785) in 24 hours for about 100 watts hours
while my desktop haswell produces 1400 points running 8 concurrent threads in 4 hours, but it consume 100 watts x 4 hours ~ 400 watts hours
so by the points per watt benchmark Pi4 wins out (probably significantly) based on points per watt if these are indeed the numbers

You could also park a threadripper 3990x in a corner, headless, and let it run like a server as well.

Now, if there goal is just to see the max points per watt that can be obtained on a pi 4, down volting it and reducing MHz is the way to go - maybe you could hit threadripper efficiency with low enough voltage? You would need a lot of pis to get an appreciable amount of credit though.
ID: 95619 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 20,367,622
RAC: 372
Message 95620 - Posted: 30 Apr 2020, 16:43:30 UTC - in response to Message 95618.  

I still wouldn't declare a winner right now. There are lots of power consumption tweaks that you could apply to your Raspberry Pi 4 that nobody had bothered to measure up until now.

Also you are comparing the system AC power consumption of an rPi with a theoretical TDP (thermal design power) of a CPU itself (that is itself usually less than the maximal power used). If you add the motherboard, memory, fans, storage and then multiply this all with 80-85% AC conversion efficiency of a power supply, the difference wouldn't be that great as you think.

Also, you may optimize your budget if you ditched the SD card and booted over PXE and ran multiple Pi's from a single power supply (an ATX one one should supply a few dozen maybe).

And while we are mostly mentioning Raspberry Pi 4, there exists lots of other modern single board computers (SBC) on the market with 1-4GB RAM that can run 64-bit Linux, some cost less than a Pi, you may want to look these up. You should also not forget that you may be eligible for a discount when purchasing 100 of anything. I myself got into contact with an rPi employee and got a discount of about 40% for a volume purchase.

When you were comparing the overclocked Pi's you've omitted some important factors:
    - Overclocked systems need more expensive cooling
    - Overclocked systems carry a higher maintenance cost (lockups and potentially file system corruption every once in a while as you already see here, memory bit errors, etc)
    - Overclocked systems _may_ see a higher annual failure rate, increasing costs some more



Despite all of the above advantages, I still agree that managing a hundreds hosts requires great skills and is not for everyone, so purchasing a big box is definitely the easier route for most.



Quick post from a phone, but why would you use a 80% efficient power supply with a $4000 processor? You can pickup a 96% efficient, 600 watt, fanless seasonic titanium power supply for less than $200. With a 12 year warranty, you could likely use it over 4 system builds as well.
ID: 95620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95621 - Posted: 30 Apr 2020, 16:51:10 UTC - in response to Message 95620.  
Last modified: 30 Apr 2020, 17:12:51 UTC

That actually sounds very impressive. Last time I looked around here, PSU offering was much worse and more expensive.

Still, if you powered the horde of Pi's from such a PSU, their power consumption would probably also go down in the same proportion allowing for the same argument - just sayin'.

Here is a link just for reference:

ID: 95621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95625 - Posted: 30 Apr 2020, 17:38:59 UTC - in response to Message 95619.  
Last modified: 30 Apr 2020, 17:45:23 UTC

You could also park a threadripper 3990x in a corner, headless, and let it run like a server as well.

Now, if there goal is just to see the max points per watt that can be obtained on a pi 4, down volting it and reducing MHz is the way to go - maybe you could hit threadripper efficiency with low enough voltage? You would need a lot of pis to get an appreciable amount of credit though.

well, threadripper 3990x is beyond affordable for me for now let alone add the other hardware that needs to go with it to leverage on that performance.

for Pi4, i'd guess 'over voltage' is inevitable where it comes to overclocking
https://qengineering.eu/overclocking-the-raspberry-pi-4.html
i'm not sure about the physics or math behind it, but my guess is it is some sort related to those RC charge and discharge cycles in which at the overclocked speeds square waves no longer looked square and u'd be lucky if it doesn't look 'triangle', and switching at even faster speeds may make it look more like dc than even a wave (literally shorted / filtered to ground), the switching event is completely lost causes errors, corrupts memory. so over voltage i'd guess probably make those RC curvy voltages just about trigger the switching at the borderline cases, literally gambling with the stochastic electrical voltages and slew rates.

of course as i'm trying as well, every little less voltage means less power (square of it i'd guess), so it'd take some experiments to get a lowest overvoltage that kind of runs just well. i've even tried -ve values, doesn't work, so i stick with overvoltage 2 at 1.75 ghz for now.
at the moment, i'm happy to stay at 1.75 ghz as it seem to run problem free and i'm able to run at pretty low temperatures at the overclocked rates, 42-45 C 3 concurrent threads with my current heat sink - fan.

i went with Pi4 as in part running it on my desktop is sometimes rather disruptive. I'd either leave my desktop running the whole night on. or that i simply need to suspend or cancel the jobs.

Pi4 is ideal in this case, it runs 'on its own', hardly bothers me and i'd just occasionally check if things are running well there.
ID: 95625 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95626 - Posted: 30 Apr 2020, 17:48:10 UTC - in response to Message 95625.  

BOINC takes care of snapshotting for you. If you simply shut down your computer in the evening and turn it back on in the morning (either manually or automatically), it will resume where it left off.

The i686 core had a bug few weeks ago that kept it from snapshotting properly, but it has otherwise been working correctly since I disabled 32-bit codes.
ID: 95626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95627 - Posted: 30 Apr 2020, 18:05:09 UTC - in response to Message 95620.  
Last modified: 30 Apr 2020, 18:06:18 UTC

Quick post from a phone, but why would you use a 80% efficient power supply with a $4000 processor? You can pickup a 96% efficient, 600 watt, fanless seasonic titanium power supply for less than $200. With a 12 year warranty, you could likely use it over 4 system builds as well.

i'm not sure where i once pounce into a graph that shows if you run at very low power even a 80% 'efficient' PSU gets much less than 80% efficient.
while a power supply pushed beyond the specs, efficiency would fall significantly as well. so i'd guess is a kind of a trade off, i.e. one would like to run at max efficiency but that the power draw is probably less than needed to get that 80% efficiency and it did not get that 80% after all
ID: 95627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95628 - Posted: 30 Apr 2020, 18:27:59 UTC
Last modified: 30 Apr 2020, 18:28:52 UTC

there is something else i kind of had a brain wave, if pi4 is after all running stable, the checkpoint interval can probably be extended out like 5 minutes
that probably have an effect on the overall power consumption, and does less writes to the sd card as well
ID: 95628 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 20,367,622
RAC: 372
Message 95629 - Posted: 30 Apr 2020, 18:40:03 UTC - in response to Message 95628.  

there is something else i kind of had a brain wave, if pi4 is after all running stable, the checkpoint interval can probably be extended out like 5 minutes
that probably have an effect on the overall power consumption, and does less writes to the sd card as well

This is a interesting point. Perhaps the power fluctuations I see are checkpoints writing to flash, not the ups trying to pick a specific wattage to display. If I get some time tonight, I’ll try interesting and decreasing the checkpoint time
ID: 95629 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Running on a 4GB Raspberry Pi 4 - How to?



©2024 University of Washington
https://www.bakerlab.org