Running Rosetta on Raspberry Pi 3B+ (how to guide)

Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide)

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95043 - Posted: 21 Apr 2020, 12:27:24 UTC - in response to Message 95041.  
Last modified: 21 Apr 2020, 12:31:42 UTC

If you look at the Raspberry Pi 3 of the original poster, you can see that it is crunching correctly without an issue:


We can only tell how the power efficiency of a Pi3 vs. Pi3 vs Ryzen compare if somebody posts power consumption estimates after having done every available power optimization step.

For best results, you should have an AP in each room you are placing a node in. You may consider connecting them either via cabling or by configuring them to be WDS repeaters or mesh routers, ideally on a separate backhaul band if it is dual band. OpenWrt can do all this on many cheap routers. You can get a cheap OpenWrt-capable wifi router with external antenna for $5-10, it is usually much more reliable in most use cases to spread out three of these in your house on separate channels than to try to purchase the most powerful one available.

Just for kicks, you may even experiment with building a Bluetooth piconet, Wifi P2P Direct or mesh networking between the Pi nodes as well, not sure how well its hardware is suitable for this.

ID: 95043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 95044 - Posted: 21 Apr 2020, 12:39:44 UTC - in response to Message 95043.  

If you look at the Raspberry Pi 3 of the original poster, you can see that it is crunching correctly without an issue:


We can only tell how the power efficiency of a Pi3 vs. Pi3 vs Ryzen compare if somebody posts power consumption estimates after having done every available power optimization step.

For best results, you should have an AP in each room you are placing a node in. You may consider connecting them either via cabling or by configuring them to be WDS repeaters or mesh routers, ideally on a separate backhaul band if it is dual band. OpenWrt can do all this on many cheap routers. You can get a cheap OpenWrt-capable wifi router with external antenna for $5-10, it is usually much more reliable in most use cases to spread out three of these in your house on separate channels than to try to purchase the most powerful one available.

Just for kicks, you may even experiment with building a Bluetooth piconet, Wifi P2P Direct or mesh networking between the Pi nodes as well, not sure how well its hardware is suitable for this.


Any idea how power efficient the Bluetooth is? That could be an ideal setup - 4 pis connect to one via Bluetooth and have WiFi, HDMI, LEDs, and USB turned off, and the 5th one has a 1gb Wired or wireless network connection back.

Also, any idea on power consumption of the SD card?
ID: 95044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 95046 - Posted: 21 Apr 2020, 13:53:47 UTC - in response to Message 95045.  

Found my own answer on the SD power usage.

If the card and the OS supports sd sleep commands, it’s 1 to 4 mw. Otherwise around 100mw at idle.

Writes are expensive power wise in SD cards, depending on class of card, and can consume >250mw.

on Linux, you could be writing frequently in /var/log (not to mention boinc logs, check points, WU downloads, etc), so there could certainly be power optimizations to be had regarding SD card usage.
ID: 95046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95064 - Posted: 21 Apr 2020, 19:24:58 UTC - in response to Message 95044.  

I've read a test some time ago that compared wifi vs. Bluetooth tethering from a phone and for light browsing use cases, Bluetooth consumed way less. Unfortunately I couldn't find this reference right now.

Also Bluetooth 4/5 has been designed for always-on operation and should vastly improve on power efficiency still, so I wouldn't be surprised if it would still be the winner. How to set it up is another question, I think you would be a pioneer in that as very few use Bluetooth, it seems.

As wifi also has power saving (correct AP settings can improve this), so the claimed ~100mW idling consumption sounds plausible. Even if Bluetooth consumed 1/10th as much. consider that if you modulated this to connect only in 1/48th of the time, the average again comes down to 2mW where we are nearing diminishing returns (compared to 300mW ethernet and the 5W Pi itself). I think the LEDs should also consume about 25mW each.
ID: 95064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 95104 - Posted: 22 Apr 2020, 1:43:17 UTC
Last modified: 22 Apr 2020, 2:40:19 UTC

Since I'll be working with a Pi 4 4GB to start, do we have a benchmark with how a 4 performs base? I would like to consider many of the tricks mentioned in this thread, but if we don't have a baseline, my first pi 4 will be setup to have as low of a memory footprint as possible (8mb to video card, minimal raspbian lite installation, etc), and I'll turn off LEDs, HDMI, and wifi to get a power benchmark from my UPS (not as good as a power meter, but its what I've got). I can let it run for a few weeks to get a base measurement for RAC and get another Pi 4 to play with running without a SD card and PXE booted using NFS vs using a SD card and flipping wireless on and off, etc.

Edit
Just saw the pi 4 thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13732

Looks like the baseline is a little over 800 recent average credit.

I'll be using tips from this thread to add extra ram to my pi 4 to try to get enough to run 4 Rosetta tasks in parallel and compare against the baseline from that thread. I'll also move my pi4 discussions to that thread.
ID: 95104 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95130 - Posted: 22 Apr 2020, 12:47:32 UTC
Last modified: 22 Apr 2020, 12:48:55 UTC

hi just some quick questions, i've got a pi4 recently. but to r@h run well on pi 3b+ , pi 4? are there streams of jobs for it e.g. covid, etc?
it seemed pi4 with 4GB memory could be a decent board to run r@h on, i'm thinking of retrofitting a heat sink and a fan so that i can perhaps clock that up to gain performance as well.
pi4 is known to be running 'hot' but that the performance is quite a bit better than pi 3b+, retrofit a heat sink + fan and i'd think i could leave it running for hours
ID: 95130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95136 - Posted: 22 Apr 2020, 15:54:53 UTC - in response to Message 95130.  
Last modified: 22 Apr 2020, 16:17:37 UTC

I see you have since found the separate topic for Raspberry Pi 4, I would post rPi4 specific questions there instead.

Comparing the rPi3 and rPi4, the latter consumes more, but should also produce more. We don't have concrete numbers about the performance/watt of each, but my gut feeling is that the rPi4 should be a little bit better. Both should be able to run for hours if you fit a huge heat sink, and/or a medium sized heatsink and a fan for rPi4.

I've also posted some rPi4 thermal tests in the other thread.
ID: 95136 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 95366 - Posted: 25 Apr 2020, 21:35:46 UTC
Last modified: 25 Apr 2020, 21:37:46 UTC

An update from running 3B+ crunching few days.

- My RPI has so far reached avg credit 585.58.
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3710842

- I personally doubt that RPI4 can be more efficient than well set up 3B+. The consumption of 3B+ should be only around 300mA with USB hub and ethernet disabled.

- I upgraded to kernel 5.4 which is currently being tested by the Raspbian team. There is almost no difference in benchmarks.

- rb_04 tasks seem have a variable rate of compressibility and this caused some troubles... Using LZO-RLE compression it happened to me that my RAM was completely taken by my ZRAM data, compressed with a factor of only 1.5. There was a lot of "virtual" space left on the ZRAM - only 1,45GB was taken, but there was no space to store more compressed data. SWAP was not getting used either, and the CPU utilisation was 50%, from that 30% was used for compressing and constant writing to and from the SD card. The system was well... almost totally irresponsive. Nasty.

After this experience, I switched back to deflate compression, I set the virtual ZRAM size to 1,88GB - simply the double of my RAM. I can tell that reading from and writing to ZRAM is quite slower. However, it seems to be working pretty fine.

zramctl:
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 deflate 481.5M 269.6M 113.7M 121.4M 4 [SWAP]
/dev/zram2 deflate 481.5M 269.1M 114M 121.4M 4 [SWAP]
/dev/zram1 deflate 481.5M 280.4M 119.3M 127.1M 4 [SWAP]
/dev/zram0 deflate 481.5M 280.4M 118.7M 126.6M 4 [SWAP]

I will update the guide for using deflate instead of lzo.
https://www.reddit.com/r/BOINC/comments/g0r0wa/running_rosetta_covid19_workunits_on_raspberry_pi/
ID: 95366 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95395 - Posted: 26 Apr 2020, 12:32:28 UTC
Last modified: 26 Apr 2020, 12:45:39 UTC

pi4 is likely more efficient for a couple of reasons, the larger memory means that there can be more concurrent jobs running.
pi4 uses a A72 super scalar cpu, that alone may make quite a lot of difference in the mflops prowess
in that sense it could complete more jobs in a same time, though at a higher power consumption

i'm thinking that we can somehow estimate the 'performance per watt' based on the r@h statistics
i.e. we have the number of credits earned and time elapsed.
what seemed missing is the watt or rather watt . hour figures but i'd guess we can base it off some isolated estimates. after all r@h is a 'benchmark' kind of lol
ID: 95395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 95396 - Posted: 26 Apr 2020, 12:48:12 UTC - in response to Message 95395.  
Last modified: 26 Apr 2020, 12:51:05 UTC

- I am running 4 concurrent jobs on RPI 3B+, which is the same as on RPI 4
- do we have any consumption and performance numbers from RPI4?
ID: 95396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95399 - Posted: 26 Apr 2020, 14:51:29 UTC - in response to Message 95366.  
Last modified: 26 Apr 2020, 14:58:45 UTC

Newer versions of zram also an option for mem_limit, have a look here:
https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html
You could set it to 70-80% of RAM for example while also keeping the max size at 200%. This would enable utilizing the swap as well (although a kernel with zswap would be more ideal for this).

After you measure the overhead that deflate causes (is it kswapd?), you may try some other compression algorithm as well.

I'll soon have a look at whether I can patch out malloc via LD_PRELOAD to enable KSM. That could help a lot.

You may want to see whether setting /proc/sys/vm/page-cluster to 0 could help with the zram overhead.
https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#page-cluster
ID: 95399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 95403 - Posted: 26 Apr 2020, 15:58:15 UTC
Last modified: 26 Apr 2020, 16:10:04 UTC

i do have some suggestion for running on Pi 3B+ or even for that matter Pi 4, create a swap partation and attach it. e.g..
mkswap /dev/mmcblk0p3 

then edit /etc/fstab and add
/dev/mmcblk0p3  swap    swap    defaults

you'd need to check if swap is at mmcblk0p3, that'd need to be done while partitioning the sd card

i did that and committed 2 GB in there. it is somewhat a 'waste' of space on the SD card but i found it is actually useful.
there are quite a lot of tmpfs being mounted and perhaps some daemons running in the background at low priority.
i found that while boinc is running, the swap space is apparently used. i'd imagine that some of the tmpfs partitions are saved into swap instead.
that would actually free up some precious ram memory and allow r@h tasks to run

in terms of zram, i'm a little concerned that it may take more cpu 'effort' than simply swap
ID: 95403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 95406 - Posted: 26 Apr 2020, 18:17:54 UTC - in response to Message 95396.  

- I am running 4 concurrent jobs on RPI 3B+, which is the same as on RPI 4
- do we have any consumption and performance numbers from RPI4?

My pi 4 4gb is running at about 4 watts per my apc ups (I don’t have a more accurate way to measure). It only has HDMI turned off at the moment - on my next pi 4 I’ll be playing a little more with power settings.

The currently running pi 4 can be found here:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4215281

I also went ahead and setup a pi 3B+. Trying to run 4 processes, I ended up running out of memory and thrashing, even with zram - 3 processes seems to be working much better.
ID: 95406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 95408 - Posted: 26 Apr 2020, 20:43:49 UTC - in response to Message 95399.  

Newer versions of zram also an option for mem_limit, have a look here:
https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html
You could set it to 70-80% of RAM for example while also keeping the max size at 200%. This would enable utilizing the swap as well (although a kernel with zswap would be more ideal for this).

After you measure the overhead that deflate causes (is it kswapd?), you may try some other compression algorithm as well.

I'll soon have a look at whether I can patch out malloc via LD_PRELOAD to enable KSM. That could help a lot.

You may want to see whether setting /proc/sys/vm/page-cluster to 0 could help with the zram overhead.
https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#page-cluster


Thanks, I have 4 days to deliver my Bachelor thesis, so I will look into these things afterwards. Right now it is running fine without swap with max 520MB RAM taken while using 1,35GB ZRAM. I would like to try lz4hc compression as well, however it seems like it is not a part of the Raspbian 5.4 kernel.

The kswapd0 is definitely using a noticeable amount of CPU time. However the consumption is in "peaks". Once in a while it uses 100 percent and it sits idle most of the time.
According to top:
System has been up for 49h 20min = 2960min and that translates to 11760min of CPU time.
kswapd0 has been running for 238min and 22s.
boinc has been running for 92min and 38s
systemd has been running for 23min and 25s
wpa_cupplicant has been running for 3min 39s

According to this, the kswapd0 is taking 2,02% of the CPU time. Not too bad I would say.

i did that and committed 2 GB in there. it is somewhat a 'waste' of space on the SD card but i found it is actually useful.
there are quite a lot of tmpfs being mounted and perhaps some daemons running in the background at low priority.
i found that while boinc is running, the swap space is apparently used. i'd imagine that some of the tmpfs partitions are saved into swap instead.
that would actually free up some precious ram memory and allow r@h tasks to run

in terms of zram, i'm a little concerned that it may take more cpu 'effort' than simply swap

It could be a good idea to measure speed of writing to zram and flash. LZO/LZO-RLE/LZ4 compression should be fairly fast and I doubt that SD card is faster. When it comes to deflate, SSD drive will likely be faster. However I do not have one here, and I would really like to avoid writing to the SD card more than necessary.
ID: 95408 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 95409 - Posted: 26 Apr 2020, 20:47:10 UTC - in response to Message 95406.  


My pi 4 4gb is running at about 4 watts per my apc ups (I don’t have a more accurate way to measure). It only has HDMI turned off at the moment - on my next pi 4 I’ll be playing a little more with power settings.

The currently running pi 4 can be found here:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4215281

I also went ahead and setup a pi 3B+. Trying to run 4 processes, I ended up running out of memory and thrashing, even with zram - 3 processes seems to be working much better.

Let's do a performance comparison in few days. I will keep mine running and then we can compare gained points. I had to change zram compression back to deflate and increase the size to 1,9GB. Afterwards, all seems fine.
ID: 95409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 96238 - Posted: 7 May 2020, 16:02:30 UTC

My Pi3B+ seems to have topped out at about 600 credit with 3 Rosetta processes. I’ve dropped down to 2 processes, and switched zram to lz4 to see if it affects the average credit or not.
ID: 96238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Endgame124

Send message
Joined: 19 Mar 20
Posts: 63
Credit: 17,755,637
RAC: 23,884
Message 96310 - Posted: 9 May 2020, 14:56:55 UTC - in response to Message 96238.  

My Pi3B+ seems to have topped out at about 600 credit with 3 Rosetta processes. I’ve dropped down to 2 processes, and switched zram to lz4 to see if it affects the average credit or not.


Dropping to 2 processes did not improve performance (dropped to 530 average credit and still seems to be trending downward) so I think we can say with confidence that in pretty much all cases, as long as you aren’t thrashing swap to a sdcard, more processes is better.

I may try 3 processes again with lz4, but perhaps it would be better to switch to ZSTD? Better compression with similar overhead to lzo.
ID: 96310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide)



©2024 University of Washington
https://www.bakerlab.org