Posts by bkil

41) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 95009)
Posted 20 Apr 2020 by bkil
Post:
I'm not sure where the author got the measurement, but I would believe that the ethernet chip itself could consume that amount, but you always have to add the transformer and other components as well. This: review also mentions 0.2W power saving when using wifi instead of ethernet (measurement precision unknown):
http://https://www.digitec.ch/en/page/the-raspberry-pi-4-new-top-notch-and-tough-to-get-hold-of-12482

I don't have a Pi of my own, but I've been told my someone that ethernet on a Pi adds about 0.3W. However, I had measured in the past various kinds of wifi routers. The more efficient ones consume 0.9W from the wall when idling, but you need to add +0.3W for each inserted 100M ethernet jack, easily consuming more than the whole chipset if I insert 5 cables. Interestingly I've measured the same amount on a notebook and a netbook as well. Gigabit consumes much more, in the order of a watt maybe, but that can depend on a lot of factors.

Also it is near trivial to switch wifi on/off from software (for example via cron), but it is usually not possible to achieve the same amount of power saving without physically disconnecting the ethernet cable - something not possible in software.
1-2 lines of shell maybe by brute force from cron, a few more lines if you want to do it really efficiently (check whether there are any tasks waiting for upload, network up, ping something until it works, boinccmd --network_available, then wait a minute and/or parse out the logs or query boinccmd --get_file_transfers until it is empty, network down).

So running the network for 2.5 hours is a luxury. We could measure this easily, but if you have at least a 5Mb/s Internet connection, the few megabytes Rosetta uploads/downloads shouldn't tax you that much. I wouldn't be surprised if with efficient coding, you could actually get away with less than half an hour a day in total. If you increase WU runtime to communicate less, maybe even less.

This might sound like over doing it, but every watt counts in your rig efficiency and your temperatures.
42) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 94973)
Posted 20 Apr 2020 by bkil
Post:
In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home.
* http://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&postid=10377
* http://https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2194&postid=24612#24612

Basically, from what I understand, what really bothers you right now is WU to WU variability. Note that this will all average out on the long term, even after a few days, but definitely after 2 weeks (RAC).

Thus your aggregate credit count and RAC is still as good as anything for the purpose of keeping up the friendly competition with your peers, getting feedback about your contribution, fine tuning the performance of your hardware, checking whether your boxes are producing as expected for the given kind of hardware, etc.

Hence what you are asking (precise WU flops estimates) would require lots of development and maintenance time to be devoted to something that isn't that important at all and would take away time from research.
43) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94969)
Posted 20 Apr 2020 by bkil
Post:
That sounds like an excellent plan, do keep us posted. I'm especially interested in your future power consumption! Also don't forget to update your firmware as that decreases consumption a lot. You may even try undervolting everything and underclocking unused parts of the chips (like the GPU).

Also, depending on how willing you are to do some experimentation, as we now know that 1GB RAM + zram swap/zcache can be enough for 4 tasks at once, that means we actually have about 3GB free to play with! You could even consider creating a zram compressed RAM-disk that could hold about 5GB of data. Or at least store your I/O hungry slots directory here and story the project directory over NFS.

An added benefit of not using NFS at all would be that you could also disable ethernet (FYI. gigabit consumes extra) and only enable file transfers over wifi intermittently to save even more power. That would require PXE booting a mini-distribution onto a RAM-disk as well, similar to how Puppy Linux does it on the PC. I actually have a PC setup just like this: booting squashfs to RAM, spinning off disks, zram swap, zram scratch system (ext4 for now due to trim support).

I feel this one to be a little tight though in 4GB, but it would be doable with a deduplicating file system in RAM or doing KSM on the whole RAM by using qemu for example (would need to benchmark the impact).
44) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94954)
Posted 19 Apr 2020 by bkil
Post:
Thanks for the write up and sharing your detailed setup. Keep spreading the word!

Please read the posts that explain how credits are awarded in Rosetta@home.


TL;DR: the time a WU takes doesn't really matter, it's the number of finished decoys that count and the difficulty of the individual decoys within a given kind of WU. There are various kinds of WU's, maybe dozens of them, all having mixes of various difficulties.

The only accepted valid way of measuring the productivity of your rig is to run a given configuration without touching anything for 2 weeks and record the RAC after that point. All other kind of determination is by definition a guesstimate.

If you do want to benchmark the effects of your tweaks scientifically, I think you could copy away a slot directory which seems to be checkpointing and producing models a lot for safe keeping. The parametrization of the executable and all input data is available next to it (or from `ps -e f`), along with the random seed I think. Hence creating a script to run the same executable for a few hours could show you the seconds per model production speed on that given work unit. To properly simulate the memory and kernel load as well, you should probably run this four times at the same time (maybe from separate places). This type of measurement usually wouldn't generalize, but if you are only using it to tweak a given computer (kernels, configs, paging setup, sysctls, cooling and whatnot), it should be useful.

Still you would need to repeat the measurement a couple of times to arrive at any reasonable statistical significance that could show 5-10% of difference.

Also, I think if you create a complete rewrite of your first post that encompasses all points written therein with no omission (possibly with using strike-through for outdated information) and perhaps containing some new or updated links, you may ask a moderator to copy&paste it for you into the top post (but use this wisely as mod time is costly).

45) Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks (Message 94952)
Posted 19 Apr 2020 by bkil
Post:
The answer is +25% credit compensation.

I've simply plugged in the data (4x memory requirement) into my equations for 3700X and 3950X at the The most efficient cruncher rig possible thread that amortizes the cost of power and capital expenditure against produced RAC and solved for the needed RAC correction to arrive at the same RAC/$/5years. I assumed 0.3W/GB of power consumption for DDR4 RAM.

Although, you could rephrase the same question in another way: based on supply and demand, if a great majority of volunteers have insufficient amount of memory, how do I incentivize them to purchase more? If you put it that way, adding +50% may not be that far fetched.
46) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94923)
Posted 19 Apr 2020 by bkil
Post:
Thanks for reporting back and for all the experiments, these are very useful insights! I'm glad you feel like you are learning useful things during the ride.

Surely `deflate` and `zlib` are the slowest choices available and they weren't meant for realtime use cases at all. On a desktop machine I use, I store my BOINC slot and project folders on a zram compressed ramdisk formatted to ext4 for a nice I/O speed boost. The space saved by `deflate` compared to `LZO` was about 20%.

How did you manage to benchmark the "speed of calculations" so precisely (close to a margin of error), what did you check? Could you perhaps consider benchmarking some other algorithms as well? LZ4 and LZ4HC were also designed for realtime use and can give you a bit more headroom. You may check what is available on your system from this, although it doesn't list all possibilities:
cat /sys/block/zram0/comp_algorithm
47) Questions and Answers : Unix/Linux : Rosetta@home does not work on Raspberry Pi 3b + (Message 94920)
Posted 19 Apr 2020 by bkil
Post:
I don't know, but perhaps you may ty to increase GPU RAM and some other boot options.
48) Questions and Answers : Unix/Linux : Rosetta@home does not work on Raspberry Pi 3b + (Message 94790)
Posted 18 Apr 2020 by bkil
Post:
See this one:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13795
Running Rosetta on Raspberry Pi 3B+ (how to guide)
49) Message boards : Number crunching : L3 Cache is not a problem, CPU Performance Counter proves it (Message 94655)
Posted 17 Apr 2020 by bkil
Post:
Thank you for the data point. Although disabling HT in the BIOS may or may not do something else as well to reduce power so much (maybe some higher order logic?). So that boils to 36% credit gain with HT and at least 13% credit/Watt gain on your i7-7700K (even more if we considered total system power).

Interestingly, I've noticed elevated temperatures on a Haswell as well when enabling HT, so I'd need to do some measurements too. I first thought it was some kind of an anomaly.

Also, I usually compute with 30-50% HT gain in my approximations but maybe I'd need to revise that formula a bit - probably depending on architecture.
50) Message boards : Number crunching : L3 Cache is not a problem, CPU Performance Counter proves it (Message 94641)
Posted 16 Apr 2020 by bkil
Post:
Could you perhaps also measure the power consumption difference between running 8 WU vs. 16 WU on your Ryzen? My theory is that if the CPU is blocked for longer, it may consume a bit less energy. This would also increase the credit/watt ratio more than 17% then..
51) Message boards : Number crunching : The most efficient cruncher rig possible (Message 94638)
Posted 16 Apr 2020 by bkil
Post:
Well, investments can be tricky, but basically loans are never free.

Don't envision promises of 100% returns and such, but in the long run, it's actually realistic for common people to find a safe investment portfolio with 5-10% return year over year, definitely above inflation. Surely we're having a hard time right now so don't base your estimates on such temporary conditions, but don't worry, we'll eventually recover as we had always in the past.

Let's considered that you have $5000 today. it is not the same if you have to pay $5000 upfront for a computer vs. $1000/year, even if we adjust the payment for inflation. In the latter case, you could still "use" $4000 of your money for a year to run your business (ice cream stand?), trade stocks or whatever. The next year you could still make use of $3000 of your money. If investing in something with a 10% return, you gain $1215 more by the fourth year (and $552 for 5%).

So in the end, despite the fact that you were set back by the same amount, you could make good use of your money with smart decisions, and thus you will have more money for your next computer.

I consider contributing to BOINC a long term, potentially lifetime investment that I would like to recommend to as many as possible, and it is always a good idea to optimize your investments.
52) Message boards : Number crunching : The most efficient cruncher rig possible (Message 94637)
Posted 16 Apr 2020 by bkil
Post:
Thank you for the data point. So now that you downclock the 3800X, you've just got yourself a 3700X (3.6GHz) that was binned for higher overclock (and maybe less power efficiency). I think on Linux, the following is doing the same - I've set it up on a laptop to keep BOINC quiet while I'm working on it:
 echo 90 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
 echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
The measurements are believable and you have chosen a good sweet spot (below a certain frequency, fixed system consumption parts dominate). This again shows that normal air cooling should be sufficient for a 3700X.

It is also my observation that turbo boost is pretty harmful and dialing a CPU down a notch also increases its credit/watt value - we could actually plot this for various processors.
53) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94571)
Posted 15 Apr 2020 by bkil
Post:
I see my 2GB phone has also started crunching Rosetta despite the fact that it failed with the same memory limit error. My theory is that Rosetta is generating special tasks for low-memory ARM phones and SBC, maybe along with normal-memory ones for more capable ARM's. As many such devices are sitting idle, maybe there is high competition for these tasks and they dry out too soon for you to notice.

My guess is that the life cycle of tasks in BOINC is something like:
    - fetching from server scheduler
    - got task metadata
    - download assets from server (this can be hundreds of megabytes/piece)
    - slot initialization (this copies a half gig zip and than also unpacks the thousands of files from it)
    - running
    - taking snapshots
    - suspended, in memory
    - suspended, killed
    - restarting executable, reloading snapshot, reconstructing state and resuming calculation
    - running
    - result gathering & cleanup
    - upload
    - deleted



From your logs I can see that it did not get such low-memory tasks at first, but it did get some on a later retry. After downloading, it indeed started working on it without hesitation I'm not sure if I understand your problem correctly, but if my assessment is correct, you Pi only downloaded 2 tasks, but immediately started working on them. You may try to experiment by increasing your task queue size so that you don't sit idle too idle too much when supply is low (maybe 0.5+0.5?), but otherwise I think we should be patient.

So it seems, as zram wasn't kicking in at such an early execution stage on the small tasks, I think it should be the I/O. If you are really curious what make starting slow, you may fire up `top` and `iostat 5` while you are doing the resume or the initial execution. My guess is that if you wait for the SD card too much, your %wa (I/O wait) will stay up, perhaps a little system as well.

Unfortunately, SD cards themselves are very slow both in linear write throughput and especially in random access. Random access is the one that kills performance the most and this is what happens when you manipulate a vast amount of files, for example when you are extracting a huge zip with heap loads of files inside. There's no wonder here - there exist some specialty high-IO cards and USB thumb drives out there, but an SSD is more suited to cope with this.

What is the make, model number and size of your card? Many ways exist to benchmark it, like: (substitute your device)

time dd if=/dev/sda bs=30M count=2 iflag=direct of=/dev/null
hdparm -t /dev/sda


I guess you could also test random I/O with iozone3 or sysbench as well.

As a workaround, you may try to look up which file system runs the fastest for such a use case on SD cards. I imagine compression could help here (like that in btrfs if mounted right), as that reduces bandwidth, and maybe a non-journaling or log structured file system could also be advantageous (just guessing, please look it up or try it yourself). Basically most budget SD cards lack in amount of parallel open erase blocks and some other implementation details that can enable acceptable linear single-file transfer performance for FAT32, but appalling one for many files and with ext4.

You could actually partition your SD card (from your computer or from a reader when booted from another SD card) so that you reduce the size of the native file system root and create another, different kind of file system next to it that you will mount under /var/lib/boinc* via fstab).

Still overall I think because most of the time your I/O is just idling, it would make more efficient use of the existing hardware if task initialization was strictly sequential and/or if such timeouts were increased. I think they did manage to do the latter in the newer development builds, but I didn't check.
54) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94545)
Posted 15 Apr 2020 by bkil
Post:
This looks like the dreaded finish file present too long bug usually caused by I/O contention or slow disk I/O when writing the result files and cleaning up after itself. It's a pity that this was at the very end after it has successfully completed the computations:

Exit status	194 (0x000000C2) EXIT_ABORTED_BY_CLIENT
Run time	7 hours 27 min 34 sec
CPU time	7 hours 26 min 11 sec
Peak working set size	402.13 MB
Peak swap size	631.23 MB
Peak disk usage	961.72 MB


Could you perhaps try a development version of BOINC to see if it has been fixed already? I see that the website binaries for PC have already been updated, but ARM seems to be lagging behind (maybe compile from source?). I could think of a quick workaround script that detects when a job is finishing and temporarily suspends all others while it is saving its files, but that would be a major hack. Or perhaps trying a faster SD card or faster file system might help (maybe not).



Are you positive that the task wouldn't start by itself even if you waited for a few minutes? Maybe the things are related: it is trying to start the next job while the previous is preparing its results? (Just guessing) Could you find anything in the logs (boinccmd --get_messages)? Boinc usually explains why it is or is not doing things.

Also, could you perhaps post some stats while the Pi is fully loaded and happily crunching with BOINC? I would be interested in memory usage and compression ratio: `ps -e v` (or the relevant boinc lines of `top`), `free`, `cat /proc/swaps` and `zramctl` (or the appropriate folder in sysfs)

55) Message boards : Number crunching : The most efficient cruncher rig possible (Message 94500)
Posted 15 Apr 2020 by bkil
Post:
Yes, I agree that the Ryzen 7 and 9 are one of the top contenders.

It's interesting how I first thought the Threadripper3 wouldn't be worth it, but seeing this review and plugging in the parameters into the formula convinced me that this one crunches well, too.

Based on FLOPS, this one should produce 100k RAC for ~$5k TCO ($8-$10/GFLOPS).

Although, we seem to have forgotten to consider that cores within a single motherboard also share memory bandwidth that could impact overall productivity. Not everything fits into the cache you know.

We also seem to have waved away the potential interest rate lost over the years on large upfront investments and the effect of inflation on resale value. Both of these favor a bit less energy efficient specimens with a bit higher power consumption. Hence we should refine the original formula measuring efficiency, I guess.
56) Message boards : Number crunching : The most efficient cruncher rig possible (Message 94495)
Posted 14 Apr 2020 by bkil
Post:
We should actually have an editable wiki entry about the most efficient cruncher of the given year as a reference ("buyer guide"). Someone else has just asked me the same thing!
57) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94486)
Posted 14 Apr 2020 by bkil
Post:
Seeing 8 boinc processes in your listing worries me. Maybe some of them got stuck? Anyway, could you perhaps share the results after completion (time, credits, decoy count, WS_MAX)?
58) Message boards : Number crunching : Running Rosetta on Raspberry Pi 3B+ (how to guide) (Message 94485)
Posted 14 Apr 2020 by bkil
Post:
You probably have "keep tasks in memory on suspend" disabled. This is the right setting in such a constrained system, but it causes tasks to start/stop a bit slower when switching for example because the executable needs to reload and reconstructs its full state from the last checkpoint. This is normal.

Yes, zram, especially with deflate takes a measurable amount of time, but my heuristic (yet to be proven formally) is that the heap of Rosetta is mostly static, about half of it active at any given time, so the other half can easily be stored on slower media (be it it compressed in RAM or in flash).

I use it on low end PC's and find its performance tradeoff acceptable even for desktop use (slow disk swapping vs. more compression). I haven't tried it on an rPi yet, but it should generally not hurt in case a given workload uses less than half of the memory actively. My RAC isn't impacted on a 2core/2GB+zram PC.

I would probably still enable at least a little bit of swap on the flash as well (let's say 0.5-1GB). By default, it creates zrams with a higher priority than disk swaps, so the system will only touch the flash if it is really desperate.

I've also noticed that work may not be available at any given time instant, so be patient. On my PC's I usually set a buffer of 1-2 days.
59) Message boards : Number crunching : Running on a 4GB Raspberry Pi 4 - How to? (Message 94482)
Posted 14 Apr 2020 by bkil
Post:
Check these:
free
zramctl
cat /proc/swaps
ps -e v|grep -i boinc
boinccmd --get_tasks
top -n 1|head -n 20


Reduce the amount of tasks you request to the minimum and disable "keep suspended tasks in memory".
60) Message boards : Number crunching : Running on a 4GB Raspberry Pi 4 - How to? (Message 94481)
Posted 14 Apr 2020 by bkil
Post:
Have you updated your boot loader? They say it improves power consumption a lot:
https://www.jeffgeerling.com/blog/2019/raspberry-pi-4-might-not-need-fan-anymore
I found another command you may find useful to determine the clock speed hinting at throttling:
vcgencmd measure_clock arm

- via https://www.scivision.dev/raspberry-pi-check-temperature-of-cpu/. I also like their minimalistic solution of logging to cron, though I usually log power & temperatures in the same line.


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org