Running Rosetta on Raspberry Pi 3B+ (how to guide)

Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide)

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94371 - Posted: 13 Apr 2020, 20:26:34 UTC
Last modified: 13 Apr 2020, 21:03:29 UTC

Hello guys,
I would like to make a guide for getting the Rosetta to run on Raspberry Pi 3B+, or probably any other board with 1GB and more RAM. It seems like many people are struggling with it.

My set up:
HW:
Raspberry Pi 3B+
No name USB charger 5V 1A with a very short cable
SD Card 16GB
The newest version of Raspbian

Beginning state:
I assume that you have Raspbian installed and BOINC running. I set up my BOINC client to use 100 percent RAM available and 90 percent swap.

1. In case you have less than 500MB of RAM per core, set up ZRAM (follow the guide in the readme):
https://github.com/novaspirit/rpi_zram

2. Increase swap size - I set it to 2048MB. I am not sure how much is needed and how exactly it works in combination with ZRAM, I simply went with the biggest value, since I have enough free space on the SD card.
https://wpitchoune.net/tricks/raspberry_pi3_increase_swap_size.html

3. Put your kernel into 64bit mode and set up BOINC client to receive 64bit work:
http://marksrpicluster.blogspot.com/2020/04/do-something-useful-with-your-pi4.html

4. Now you should be able to download the tasks and run them on 2-3 cores


5. (Optional) In case you are fine using only SSH, decrease the amount of RAM available to the GPU to 16MB. This will give you a few more MB of RAM
sudo raspi-config
- Advanced Options - Memory split - 16MB

6. (Optional) Decrease the consumption of the chip and reduce temperatures:
- Disable USB hub and Ethernet:
echo 0 | sudo tee /sys/devices/platform/soc/3f980000.usb/buspower >/dev/null
- Disable HDMI:
sudo tvservice --off
These commands need to be reapplied after each reboot.

In the beginning, only 3 tasks were able to run at the same time. However, after a few hours of run time, my Raspberry pi managed to turn on all 4 tasks. While doing so, 1,3GB of data was stored in the swap. I have no clue how this happened, but it seems to be working fine.

The speed of calculations seems to be OK compared to my Android Huawei Honor 8 (I am using only the efficient A53 cores on it). It seems to take a bit over 30 000 sec to complete one WU on RPI 3B+, and around 36000s on A53 on Honor 8. So RPI is even a bit faster? Maybe there are fewer background processes...

If you have any questions, just ask.
ID: 94371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 114,355,457
RAC: 52,151
Message 94385 - Posted: 13 Apr 2020, 21:57:44 UTC

Thanks for this- will give it a go.
ID: 94385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94393 - Posted: 13 Apr 2020, 22:21:38 UTC

Updates:
32MB RAM seems to be enough for the GPU to run VNC at 720p.


So far I have seen some issues with WUs initializing and disabling repeatedly - I guess that the RAM space is simply quite on the limit. Let's see how it works in the long run.
ID: 94393 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom Rinehart

Send message
Joined: 28 Mar 20
Posts: 7
Credit: 1,637,467
RAC: 0
Message 94395 - Posted: 13 Apr 2020, 22:25:02 UTC

On my RPi 3B running Raspbian Buster Lite (Minimal image based on Debian Buster), I only had to put the kernel in 64 bit mode, set up BOINC client to receive 64 bit work, and lower the GPU RAM to 16MB using raspi-config. It downloaded 4 WUs and just started working.
ID: 94395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94397 - Posted: 13 Apr 2020, 22:32:57 UTC - in response to Message 94395.  
Last modified: 13 Apr 2020, 22:33:36 UTC

That is interesting, am I overengineering this? I could try to disable ZRAM to see how it works.

What is so far your average time per WU? Are all your cores running?

My times so far:
7:48
7:53
ID: 94397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94402 - Posted: 13 Apr 2020, 22:46:38 UTC - in response to Message 94371.  

Thanks for sharing! I like your setup. Are you connecting over wifi? Could you perhaps measure the power draw? You may use either a kill-a-watt or the built-in power gauge inside your laptop if you connect to a high power charger port.

I'm interested in how low can we get in power? Can you underclock the CPU/GPU? Can you switch off the LEDs? Can we switch wifi on periodically only for transfers (from cron for example)?

3 Rosetta tasks in 1GB sounds like pushing the limit a bit, but we'll see how it performs. You may get some more space if you added a little bit more zram and if you select a better algorithm. Basically I would adjust the end of the script like this:
mem=$(( ($totalmem * 4 / 3 / $cores)* 1024 ))
modprobe deflate
modprobe zlib
modprobe lz4hc_compress
core=0
while [ $core -lt $cores ]; do
  echo deflate > /sys/block/zram$core/comp_algorithm ||
   echo zlib > /sys/block/zram$core/comp_algorithm ||
   echo lz4hc > /sys/block/zram$core/comp_algorithm ||
   echo lz4 > /sys/block/zram$core/comp_algorithm
# not sure which one this kernel has
  echo $mem > /sys/block/zram$core/disksize
  mkswap /dev/zram$core
  swapon --discard -p 5 /dev/zram$core # reclaim memory better
  let core=core+1
done


You can verify swap usage by cat /proc/swaps and zramctl (or lacking that, cat /sys/block/zram0/{mem_used_total,orig_data_size}).

You should also disable "keeping apps in memory while suspended" in your computing preferences.

You may set the GPU memory as low as you like, because you can run a vncserver in a virtual framebuffer without a real X11. You may need to double check ~/.vnc/xstartup to execute an lxsession or at least a terminal. But why do you need X, if you can use boinccmd, boinctui or even remote RPC?
ID: 94402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom Rinehart

Send message
Joined: 28 Mar 20
Posts: 7
Credit: 1,637,467
RAC: 0
Message 94403 - Posted: 13 Apr 2020, 22:48:02 UTC - in response to Message 94397.  

Mine appears to only be running one WU at a time. I will try the zram approach once the current set of work I have are done. No WUs have completed so far.

I also have a RPi 3B running 64-bit ubuntu. It is running two WUs at a time. It completed three WUs so far at 13,418.19s, 5,232.10s, and 3,171.98s.
ID: 94403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94404 - Posted: 13 Apr 2020, 22:48:43 UTC - in response to Message 94395.  

I don't know what kind of WU they send on ARM platforms, but it could happen that you were just lucky. On my desktops I get a huge one every once in a while. Please follow your error/invalid statistics closely and keep us posted.
ID: 94404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ProDigit

Send message
Joined: 6 Dec 18
Posts: 27
Credit: 2,718,346
RAC: 0
Message 94406 - Posted: 13 Apr 2020, 22:51:00 UTC
Last modified: 13 Apr 2020, 22:51:53 UTC

If you go headless you can run with just 8MB of VRAM (I set my resolution to 480/512p headless).

If you only run Boinc on the desktop OS, you'll be able to run it with 24MB of VRAM, if you choose 720p resolution (and 16MB on 480p; but the Raspbian desktop OS isn't very sub-768pix friendly).

I would seriously advise against using a swap file.
Your swap file should read zero usage.
Any Boinc project uses a lot of disk reads or writes, so if you do want to run it, I would recommend to boot from a bootable SSD if possible (via USB3,0).
It not only speeds up the process, but it also will be much less susceptible to data corruption than SD cards are.
And, especially if you're not running an A1/A2 card from Sandisk, you probably are going to experience system lags on the desktop.

From HTOP it appears you're only running 3 out of 4 cores for Rosetta.
Each one seems to use about 300-400MB of RAM.
In most scenarios, I would recommend to run only 2 instances of Rosetta on a 1GB system, and 4 on 2GB of available RAM.
ID: 94406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94409 - Posted: 13 Apr 2020, 22:56:31 UTC - in response to Message 94406.  

Those are some good points. Although using a compressing file system like btrfs might also help.

After you set up zram, it is normal to see zram swap usage (verify in /proc/swaps) and it isn't slowing things down. Swapping out unused portions of daemons/executables to SD can also help free up valuable space for others, so a few hundred there could also be considered normal. (Then you could always try zswap that could maybe even compress swap on disk as well)

I guess I wouldn't use such a small cruncher interactively as a desktop myself, so lagging shouldn't be an issue.
ID: 94409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94425 - Posted: 14 Apr 2020, 9:11:25 UTC - in response to Message 94402.  
Last modified: 14 Apr 2020, 9:17:49 UTC

Thank you guys for advices. In the morning, after waking up, only a single task was running. During night, RPI downloaded Rosetta for Portable devices. Maybe it is more RAM demanding? I managed to initialize one more task by stopping and running again the already running task, but not more.

Thanks for sharing! I like your setup. Are you connecting over wifi? Could you perhaps measure the power draw? You may use either a kill-a-watt or the built-in power gauge inside your laptop if you connect to a high power charger port.
I'm interested in how low can we get in power? Can you underclock the CPU/GPU? Can you switch off the LEDs? Can we switch wifi on periodically only for transfers (from cron for example)?

I will try to measure consumption in the evening. I have not tried any fancy turning on-off wifi yet.

3 Rosetta tasks in 1GB sounds like pushing the limit a bit, but we'll see how it performs. You may get some more space if you added a little bit more zram and if you select a better algorithm. Basically I would adjust the end of the script like this:
mem=$(( ($totalmem * 4 / 3 / $cores)* 1024 ))
modprobe deflate
modprobe zlib
modprobe lz4hc_compress
core=0
while [ $core -lt $cores ]; do
  echo deflate > /sys/block/zram$core/comp_algorithm ||
   echo zlib > /sys/block/zram$core/comp_algorithm ||
   echo lz4hc > /sys/block/zram$core/comp_algorithm ||
   echo lz4 > /sys/block/zram$core/comp_algorithm
# not sure which one this kernel has
  echo $mem > /sys/block/zram$core/disksize
  mkswap /dev/zram$core
  swapon --discard -p 5 /dev/zram$core # reclaim memory better
  let core=core+1
done

I used your code, but it seems like the use of zram partitions has even lowered. I am not sure why. I do not quite understand how to properly configure zram.

You may set the GPU memory as low as you like, because you can run a vncserver in a virtual framebuffer without a real X11. You may need to double check ~/.vnc/xstartup to execute an lxsession or at least a terminal. But why do you need X, if you can use boinccmd, boinctui or even remote RPC?

Thanks, I did not know about boinctui. I have not tried using remote RCP yet, but I definitely will have to learn how to do so. It seems like vncserver does not start with anything less than 32MB.
Even though I am fine using CLI in most cases, it still feels more comfortable for me to use GUI.
ID: 94425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94427 - Posted: 14 Apr 2020, 9:28:14 UTC
Last modified: 14 Apr 2020, 10:03:30 UTC

This is my current state:
CLI only, 4MB VRAM
ZRAM algorithm from sangaku
4 WUs running again. Maybe it just needs time? Certain order?...

https://prnt.sc/ryz2ua
https://prnt.sc/ryz560

pi@raspberrypi:~ $ pi@raspberrypi:~ $ cat /proc/swaps
Filename Type Size Used Priority
/var/swap file 2024444 0 -2
/dev/zram0 partition 330040 21776 5
/dev/zram1 partition 330040 21796 5
/dev/zram2 partition 330040 21552 5
/dev/zram3 partition 330040 21660 5

Edit: 30 minutes later, all 4 tasks still running, but now the usage of ZRAM partitions has significantly increased to 200 000 each. Swap use is still 0
ID: 94427 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ProDigit

Send message
Joined: 6 Dec 18
Posts: 27
Credit: 2,718,346
RAC: 0
Message 94464 - Posted: 14 Apr 2020, 18:22:34 UTC

Not sure how compressing RAM/Swap would work out for you, however it's pretty easy to set up the swap partition on an external drive, so you don't have to worry about SD card wear as much.

However, not all Boinc projects use a lot of RAM.
You could run 2 threads running Rosetta, and 2 running other project(s).
ID: 94464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PorkyPies

Send message
Joined: 6 Apr 20
Posts: 45
Credit: 1,650,779
RAC: 0
Message 94474 - Posted: 14 Apr 2020, 21:09:47 UTC
Last modified: 14 Apr 2020, 21:13:16 UTC

The current tasks seem to go up to ‘800MB so you should be able to get away with at least one task on the 1GB Pi’s.

I recommend keeping at least 1 core free on the Pi. If you’ve got a 1GB Pi only run one at a time. If using zram you might be able to run two at a time. Why? zram needs some CPU time to do its job so keeping cores free means they are available without having to swap the task out.

When the tasks start they unzip over 1.5GB of data and copy files to the slot directory. That is a killer for the SD card. I believe the unzip part is multi-threaded so it will use all available cores but it still takes a few minutes to write to the SD card.
MarksRpiCluster
ID: 94474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94478 - Posted: 14 Apr 2020, 22:28:22 UTC - in response to Message 94464.  
Last modified: 14 Apr 2020, 22:32:33 UTC

Not sure how compressing RAM/Swap would work out for you, however it's pretty easy to set up the swap partition on an external drive, so you don't have to worry about SD card wear as much.

I have disabled swap (set the size to 0). Everything seems to be running fine.

However, not all Boinc projects use a lot of RAM.
You could run 2 threads running Rosetta, and 2 running other project(s).

Thanks for the idea, I added Universe@Home and I might try to join also LHC@Home. There are definitely moments when Rosetta is not using all available RAM - just because it is so big. The question is - how to set it up so that it runs each project on 2 cores? I changed the resource share of Universe@home to 1. And I am testing it...

I recommend keeping at least 1 core free on the Pi. If you’ve got a 1GB Pi only run one at a time. If using zram you might be able to run two at a time. Why? zram needs some CPU time to do its job so keeping cores free means they are available without having to swap the task out.

I will try to test this over the next few days, and try to see how long the WU takes with no cores free and a single-core free.

When the tasks start they unzip over 1.5GB of data and copy files to the slot directory. That is a killer for the SD card. I believe the unzip part is multi-threaded so it will use all available cores but it still takes a few minutes to write to the SD card.

I noticed that if I suspend Rosetta and switch, it takes over 2 minutes to return from Universe to Rosetta. I guess that demonstrates the load...

For anyone interested, here are current measurements of RPI 3B+:
https://www.raspberrypi.org/forums/viewtopic.php?p=1286716

According to these results, 4 core load + WIFI results in ca. 390mA. I did not test this myself.
ID: 94478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94485 - Posted: 14 Apr 2020, 22:59:18 UTC - in response to Message 94478.  
Last modified: 14 Apr 2020, 23:00:53 UTC

You probably have "keep tasks in memory on suspend" disabled. This is the right setting in such a constrained system, but it causes tasks to start/stop a bit slower when switching for example because the executable needs to reload and reconstructs its full state from the last checkpoint. This is normal.

Yes, zram, especially with deflate takes a measurable amount of time, but my heuristic (yet to be proven formally) is that the heap of Rosetta is mostly static, about half of it active at any given time, so the other half can easily be stored on slower media (be it it compressed in RAM or in flash).

I use it on low end PC's and find its performance tradeoff acceptable even for desktop use (slow disk swapping vs. more compression). I haven't tried it on an rPi yet, but it should generally not hurt in case a given workload uses less than half of the memory actively. My RAC isn't impacted on a 2core/2GB+zram PC.

I would probably still enable at least a little bit of swap on the flash as well (let's say 0.5-1GB). By default, it creates zrams with a higher priority than disk swaps, so the system will only touch the flash if it is really desperate.

I've also noticed that work may not be available at any given time instant, so be patient. On my PC's I usually set a buffer of 1-2 days.
ID: 94485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94486 - Posted: 14 Apr 2020, 23:07:15 UTC - in response to Message 94427.  

Seeing 8 boinc processes in your listing worries me. Maybe some of them got stuck? Anyway, could you perhaps share the results after completion (time, credits, decoy count, WS_MAX)?
ID: 94486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lakotamm

Send message
Joined: 28 Jun 19
Posts: 22
Credit: 171,192
RAC: 0
Message 94524 - Posted: 15 Apr 2020, 9:42:43 UTC - in response to Message 94486.  
Last modified: 15 Apr 2020, 10:22:41 UTC

Seeing 8 boinc processes in your listing worries me. Maybe some of them got stuck? Anyway, could you perhaps share the results after completion (time, credits, decoy count, WS_MAX)?

It seems like it is just htop showing it in a weird way - top shows it correctly.
https://prnt.sc/rzn8ih

You can see the stats here:
https://boinc.bakerlab.org/rosetta/results.php?hostid=3710842
It seems like had I had 1 error so far. Let's see whether more appear.

The biggest issue seems to be right now the start-up of a WU. If I have 4 WUs running and one gets finished, another one won't start by itself. Instead, it stays waiting for memory. I need to manually suspend one of the running WUs - which allows the new WU to start. Afterwards I can again resume the suspended WU again.

I might make a script for this and possibly use an external drive for storing the tasks. Or use only 2-3 cores for Rosetta. However, this would mean loosing 25 percent CPU time unused. So far I have not found a way of selecting only 2-3 cores for a project an using the rest for another project.
ID: 94524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PorkyPies

Send message
Joined: 6 Apr 20
Posts: 45
Credit: 1,650,779
RAC: 0
Message 94539 - Posted: 15 Apr 2020, 12:53:27 UTC - in response to Message 94524.  
Last modified: 15 Apr 2020, 12:54:15 UTC

So far I have not found a way of selecting only 2-3 cores for a project an using the rest for another project.

I use an app_config.xml file for that.

<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

It goes in the BOINC projects bakerlab folder. See here for details on configuring the BOINC client.
MarksRpiCluster
ID: 94539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 94545 - Posted: 15 Apr 2020, 14:28:03 UTC - in response to Message 94524.  
Last modified: 15 Apr 2020, 14:30:51 UTC

This looks like the dreaded finish file present too long bug usually caused by I/O contention or slow disk I/O when writing the result files and cleaning up after itself. It's a pity that this was at the very end after it has successfully completed the computations:

Exit status	194 (0x000000C2) EXIT_ABORTED_BY_CLIENT
Run time	7 hours 27 min 34 sec
CPU time	7 hours 26 min 11 sec
Peak working set size	402.13 MB
Peak swap size	631.23 MB
Peak disk usage	961.72 MB


Could you perhaps try a development version of BOINC to see if it has been fixed already? I see that the website binaries for PC have already been updated, but ARM seems to be lagging behind (maybe compile from source?). I could think of a quick workaround script that detects when a job is finishing and temporarily suspends all others while it is saving its files, but that would be a major hack. Or perhaps trying a faster SD card or faster file system might help (maybe not).



Are you positive that the task wouldn't start by itself even if you waited for a few minutes? Maybe the things are related: it is trying to start the next job while the previous is preparing its results? (Just guessing) Could you find anything in the logs (boinccmd --get_messages)? Boinc usually explains why it is or is not doing things.

Also, could you perhaps post some stats while the Pi is fully loaded and happily crunching with BOINC? I would be interested in memory usage and compression ratio: `ps -e v` (or the relevant boinc lines of `top`), `free`, `cat /proc/swaps` and `zramctl` (or the appropriate folder in sysfs)

ID: 94545 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide)



©2024 University of Washington
https://www.bakerlab.org