Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide)
Previous · 1 · 2 · 3
Author | Message |
---|---|
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
If you look at the Raspberry Pi 3 of the original poster, you can see that it is crunching correctly without an issue:
|
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
If you look at the Raspberry Pi 3 of the original poster, you can see that it is crunching correctly without an issue: Any idea how power efficient the Bluetooth is? That could be an ideal setup - 4 pis connect to one via Bluetooth and have WiFi, HDMI, LEDs, and USB turned off, and the 5th one has a 1gb Wired or wireless network connection back. Also, any idea on power consumption of the SD card? |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
Found my own answer on the SD power usage. If the card and the OS supports sd sleep commands, it’s 1 to 4 mw. Otherwise around 100mw at idle. Writes are expensive power wise in SD cards, depending on class of card, and can consume >250mw. on Linux, you could be writing frequently in /var/log (not to mention boinc logs, check points, WU downloads, etc), so there could certainly be power optimizations to be had regarding SD card usage. |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
I've read a test some time ago that compared wifi vs. Bluetooth tethering from a phone and for light browsing use cases, Bluetooth consumed way less. Unfortunately I couldn't find this reference right now. Also Bluetooth 4/5 has been designed for always-on operation and should vastly improve on power efficiency still, so I wouldn't be surprised if it would still be the winner. How to set it up is another question, I think you would be a pioneer in that as very few use Bluetooth, it seems. As wifi also has power saving (correct AP settings can improve this), so the claimed ~100mW idling consumption sounds plausible. Even if Bluetooth consumed 1/10th as much. consider that if you modulated this to connect only in 1/48th of the time, the average again comes down to 2mW where we are nearing diminishing returns (compared to 300mW ethernet and the 5W Pi itself). I think the LEDs should also consume about 25mW each. |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
Since I'll be working with a Pi 4 4GB to start, do we have a benchmark with how a 4 performs base? I would like to consider many of the tricks mentioned in this thread, but if we don't have a baseline, my first pi 4 will be setup to have as low of a memory footprint as possible (8mb to video card, minimal raspbian lite installation, etc), and I'll turn off LEDs, HDMI, and wifi to get a power benchmark from my UPS (not as good as a power meter, but its what I've got). I can let it run for a few weeks to get a base measurement for RAC and get another Pi 4 to play with running without a SD card and PXE booted using NFS vs using a SD card and flipping wireless on and off, etc. Edit Just saw the pi 4 thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13732 Looks like the baseline is a little over 800 recent average credit. I'll be using tips from this thread to add extra ram to my pi 4 to try to get enough to run 4 Rosetta tasks in parallel and compare against the baseline from that thread. I'll also move my pi4 discussions to that thread. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
hi just some quick questions, i've got a pi4 recently. but to r@h run well on pi 3b+ , pi 4? are there streams of jobs for it e.g. covid, etc? it seemed pi4 with 4GB memory could be a decent board to run r@h on, i'm thinking of retrofitting a heat sink and a fan so that i can perhaps clock that up to gain performance as well. pi4 is known to be running 'hot' but that the performance is quite a bit better than pi 3b+, retrofit a heat sink + fan and i'd think i could leave it running for hours |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
I see you have since found the separate topic for Raspberry Pi 4, I would post rPi4 specific questions there instead. Comparing the rPi3 and rPi4, the latter consumes more, but should also produce more. We don't have concrete numbers about the performance/watt of each, but my gut feeling is that the rPi4 should be a little bit better. Both should be able to run for hours if you fit a huge heat sink, and/or a medium sized heatsink and a fan for rPi4. I've also posted some rPi4 thermal tests in the other thread. |
lakotamm Send message Joined: 28 Jun 19 Posts: 22 Credit: 171,192 RAC: 0 |
An update from running 3B+ crunching few days. - My RPI has so far reached avg credit 585.58. https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3710842 - I personally doubt that RPI4 can be more efficient than well set up 3B+. The consumption of 3B+ should be only around 300mA with USB hub and ethernet disabled. - I upgraded to kernel 5.4 which is currently being tested by the Raspbian team. There is almost no difference in benchmarks. - rb_04 tasks seem have a variable rate of compressibility and this caused some troubles... Using LZO-RLE compression it happened to me that my RAM was completely taken by my ZRAM data, compressed with a factor of only 1.5. There was a lot of "virtual" space left on the ZRAM - only 1,45GB was taken, but there was no space to store more compressed data. SWAP was not getting used either, and the CPU utilisation was 50%, from that 30% was used for compressing and constant writing to and from the SD card. The system was well... almost totally irresponsive. Nasty. After this experience, I switched back to deflate compression, I set the virtual ZRAM size to 1,88GB - simply the double of my RAM. I can tell that reading from and writing to ZRAM is quite slower. However, it seems to be working pretty fine. zramctl: NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram3 deflate 481.5M 269.6M 113.7M 121.4M 4 [SWAP] /dev/zram2 deflate 481.5M 269.1M 114M 121.4M 4 [SWAP] /dev/zram1 deflate 481.5M 280.4M 119.3M 127.1M 4 [SWAP] /dev/zram0 deflate 481.5M 280.4M 118.7M 126.6M 4 [SWAP] I will update the guide for using deflate instead of lzo. https://www.reddit.com/r/BOINC/comments/g0r0wa/running_rosetta_covid19_workunits_on_raspberry_pi/ |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
pi4 is likely more efficient for a couple of reasons, the larger memory means that there can be more concurrent jobs running. pi4 uses a A72 super scalar cpu, that alone may make quite a lot of difference in the mflops prowess in that sense it could complete more jobs in a same time, though at a higher power consumption i'm thinking that we can somehow estimate the 'performance per watt' based on the r@h statistics i.e. we have the number of credits earned and time elapsed. what seemed missing is the watt or rather watt . hour figures but i'd guess we can base it off some isolated estimates. after all r@h is a 'benchmark' kind of lol |
lakotamm Send message Joined: 28 Jun 19 Posts: 22 Credit: 171,192 RAC: 0 |
- I am running 4 concurrent jobs on RPI 3B+, which is the same as on RPI 4 - do we have any consumption and performance numbers from RPI4? |
bkil Send message Joined: 11 Jan 20 Posts: 97 Credit: 4,433,288 RAC: 0 |
Newer versions of zram also an option for mem_limit, have a look here: https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html You could set it to 70-80% of RAM for example while also keeping the max size at 200%. This would enable utilizing the swap as well (although a kernel with zswap would be more ideal for this). After you measure the overhead that deflate causes (is it kswapd?), you may try some other compression algorithm as well. I'll soon have a look at whether I can patch out malloc via LD_PRELOAD to enable KSM. That could help a lot. You may want to see whether setting /proc/sys/vm/page-cluster to 0 could help with the zram overhead. https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#page-cluster |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i do have some suggestion for running on Pi 3B+ or even for that matter Pi 4, create a swap partation and attach it. e.g.. mkswap /dev/mmcblk0p3 then edit /etc/fstab and add /dev/mmcblk0p3 swap swap defaults you'd need to check if swap is at mmcblk0p3, that'd need to be done while partitioning the sd card i did that and committed 2 GB in there. it is somewhat a 'waste' of space on the SD card but i found it is actually useful. there are quite a lot of tmpfs being mounted and perhaps some daemons running in the background at low priority. i found that while boinc is running, the swap space is apparently used. i'd imagine that some of the tmpfs partitions are saved into swap instead. that would actually free up some precious ram memory and allow r@h tasks to run in terms of zram, i'm a little concerned that it may take more cpu 'effort' than simply swap |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
- I am running 4 concurrent jobs on RPI 3B+, which is the same as on RPI 4 My pi 4 4gb is running at about 4 watts per my apc ups (I don’t have a more accurate way to measure). It only has HDMI turned off at the moment - on my next pi 4 I’ll be playing a little more with power settings. The currently running pi 4 can be found here: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4215281 I also went ahead and setup a pi 3B+. Trying to run 4 processes, I ended up running out of memory and thrashing, even with zram - 3 processes seems to be working much better. |
lakotamm Send message Joined: 28 Jun 19 Posts: 22 Credit: 171,192 RAC: 0 |
Newer versions of zram also an option for mem_limit, have a look here: Thanks, I have 4 days to deliver my Bachelor thesis, so I will look into these things afterwards. Right now it is running fine without swap with max 520MB RAM taken while using 1,35GB ZRAM. I would like to try lz4hc compression as well, however it seems like it is not a part of the Raspbian 5.4 kernel. The kswapd0 is definitely using a noticeable amount of CPU time. However the consumption is in "peaks". Once in a while it uses 100 percent and it sits idle most of the time. According to top: System has been up for 49h 20min = 2960min and that translates to 11760min of CPU time. kswapd0 has been running for 238min and 22s. boinc has been running for 92min and 38s systemd has been running for 23min and 25s wpa_cupplicant has been running for 3min 39s According to this, the kswapd0 is taking 2,02% of the CPU time. Not too bad I would say. i did that and committed 2 GB in there. it is somewhat a 'waste' of space on the SD card but i found it is actually useful. It could be a good idea to measure speed of writing to zram and flash. LZO/LZO-RLE/LZ4 compression should be fairly fast and I doubt that SD card is faster. When it comes to deflate, SSD drive will likely be faster. However I do not have one here, and I would really like to avoid writing to the SD card more than necessary. |
lakotamm Send message Joined: 28 Jun 19 Posts: 22 Credit: 171,192 RAC: 0 |
Let's do a performance comparison in few days. I will keep mine running and then we can compare gained points. I had to change zram compression back to deflate and increase the size to 1,9GB. Afterwards, all seems fine. |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
My Pi3B+ seems to have topped out at about 600 credit with 3 Rosetta processes. I’ve dropped down to 2 processes, and switched zram to lz4 to see if it affects the average credit or not. |
Endgame124 Send message Joined: 19 Mar 20 Posts: 63 Credit: 20,367,622 RAC: 5,964 |
My Pi3B+ seems to have topped out at about 600 credit with 3 Rosetta processes. I’ve dropped down to 2 processes, and switched zram to lz4 to see if it affects the average credit or not. Dropping to 2 processes did not improve performance (dropped to 530 average credit and still seems to be trending downward) so I think we can say with confidence that in pretty much all cases, as long as you aren’t thrashing swap to a sdcard, more processes is better. I may try 3 processes again with lz4, but perhaps it would be better to switch to ZSTD? Better compression with similar overhead to lzo. |
Message boards :
Number crunching :
Running Rosetta on Raspberry Pi 3B+ (how to guide)
©2024 University of Washington
https://www.bakerlab.org