x86 vs Arm Performance

Author	Message
Melodie_Manthei_Family Send message Joined: 25 Jan 07 Posts: 15 Credit: 16,341,771 RAC: 0	Message 99574 - Posted: 5 Nov 2020, 4:46:30 UTC I've been doing some research and consideration of the Arm vs x86 CPU architecture shift that's starting to reveal itself in the computing industry. Apple's upcoming Arm transition is of course big news prompting mixed critical reviews. But I've also been keeping an eye out for this as the "thin client" mentality starts to re-emerge or it's at least testing the waters. Devices like Chromebooks, Roku, FireTV, Raspberry Pi, Nvidia Jetson, and the countless Arm single board computer clones out there have me wondering...is Arm really going to be the next big thing or not? Will it join x86 as a powerful CPU architecture but never replace it? I haven't learned enough about the actual architectures yet and, let's face it, the technical details are pretty serious reading. Instead, I started with a higher-level approach as a Rosetta enthusiast. I decided to grab readily-available statistics data for Rosetta to compare Arm-powered devices to x86 and see what things look like. There's a couple reasons for this approach: 1) Rosetta was somewhat recently ported to Arm and so maybe doesn't have a steady-state Arm/x86 balance yet 2) Balena's "Fold for Covid" project made getting many Arm devices on Rosetta much faster and easier 3) I myself have tested Rosetta on the Latte Panda Alpha, Atomic Pi, Raspberry Pi 4 4GB, RPi 4 8GB, and most recently a Jetson Nano 2GB and want to learn more about how this could scale. 4) I want to learn more about scaling up as a technical challenge while also scaling up my Rosetta contribution. Rather than drop several hundred dollars on a dedicated late-model PC with AMD/Intel CPU, I'm considering the idea of the "microtransaction" approach where it's 'easier' to spend $40 on an additional device every few months to slowly scale up a multi-node Rosetta "cluster" (but it's not a real cluster). 5) Arm seems far more power efficient so CPU performance within an order of magnitude might reveal operational cost savings. My first bit of data collection on this topic is this graph: This is showing single-core CPU performance on Y axis, CPU release date on X axis, and relative number of computers running Rosetta as of this post using each CPU (size of bubbles). As you can see, the general shape is what you'd expect...the more common CPU's are clustered around the center of the performance range by year with a few outliers out at the extremes. Also, the whole graph tends to trend upward which makes sense. The few Arm devices are plotted in red, with the RPi 4 being the largest bubble shown. The others are the RPi 3, Odroid N2, Odroid C2, and a few Nvidia devices like the TX2 and Jetson Nano. For me, the takeaway is that the Raspberry Pi 4 in particular may have better performance than a reasonable percentage of the PC's with CPU's released in the 2009 to 2014 range. Sure that's a while ago but I still consider it meaningful because I think many Rosetta computers are people's older, retired PC's or handed down by others. The PC's are old, but clearly still being used and some of these Arm devices might be a cheap and far more power-efficient replacement. For some of these older PC's, I would think a RPi 4 could have a return on investment of just a couple months to recoup the electricity savings. At a later date, I might post some data I've gathered about electrical consumption of the computers I have versus a Raspberry Pi while considering their "RAC per watt" values. Generally speaking for my specific devices, buying Raspberry Pi 4's would pay for themselves in under a year in electricity savings compared to the more powerful but less efficient PC's I'm using today. These are 2012-2016 era PC's with AMD FX-8350 "Bulldozer" or even AMD Phenom X6 but still...these little Arm devices are offering me options that didn't exist too long ago and weren't compatible with Rosetta long ago at all. I think this is very encouraging and I look forward to seeing where this goes. A bit more about the graphed data: At the time I pulled the data, Rosetta's CPU statistics page was reporting 80,005 computing devices on it's network. Among those computers there were 1002 unique CPU's. I painfully matched as many of these as I could to entries in a CPU database (https://www.techpowerup.com/cpu-specs/?sort=name) to obtain release dates. That left me with 357 CPU's worth of data which represent about 41% of the computers running Rosetta. Imperfect, but still a decent swath of devices. Arm devices do not identify themselves well in Rosetta but I traced some of the devices of interest back to those mentioned above. If I decide to invest more time into it, this might prompt me to develop a custom PCB onto which one could affix multiple Raspberry Pi Compute Model 4 devices. This would be Rossetta-focused...few ports would be physically available and the intention would be headless operation. I've created some custom PCB parts for the new 100-pin socket found on the CM4 which was a big hurdle for me to even consider going further. With that handled, I'm going to look into what pins I'd like to surface and how. I'm thinking maybe a 5-device board with one CM4 position housing USB and HDMI ports to test and configure devices, then the remaining four sockets being used for operational devices...only the PoE Ethernet port and 5V power would be connected...maybe. Some power switches and status LED's might be nice. Physical packaging is an important aspect for me...I'd like this to lend itself well to shelving or rack mounting. I'd also like to afford the option of a single huge heatsink that could span all four devices and thoughtful alignment of that heatsink to pair well with some small fans if one desires. CM4 isn't a huge cost savings for the end user per device, but it does physically package better for scaling up CPUs. There's a decent chance I'll encounter technical challenges, an existing product, life events, or even lack of interest/demand that will prevent me from ever reaching the prototype board level. But, I've made a few PCB's before with surprising success for work projects and if this ended up working out for me and potentially others I think it would be a blast to facilitate small scale Rosetta Arm farms. The world needs it now more than ever. Perhaps more to follow. ID: 99574 · Rating: 0 · rate: / Reply Quote

PorkyPies Send message Joined: 6 Apr 20 Posts: 45 Credit: 1,650,779 RAC: 0	Message 99577 - Posted: 5 Nov 2020, 11:56:49 UTC Last modified: 5 Nov 2020, 11:59:50 UTC One of the things in the ARM's favor is the price. For say $100 USD you can get an 8GB Pi4 up and running compared to the cost of a traditional x64 machine (even though they have higher core counts) and that is before you've factored in the electricity usage. I run a small cluster of Pi's, mostly Pi3's which aren't suitable for Rosetta but I do have 3 x Pi4 8GB which are. What would make things easier for people to do a small scale farm would be: 1. An off the shelf PSU than can handle 4 or 5 Pi at full load and has sockets or is cabled up ready to go. 2. Cases designed to hold multiple Pi's with cooling in mind. I addressed point 2 by designing a case that holds 4 x Pi side by side and has fans on top. I get them 3D printed. C4labs also make a couple of cases that would work. MarksRpiCluster ID: 99577 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2186 Credit: 13,657,164 RAC: 9,650	Message 99580 - Posted: 5 Nov 2020, 16:31:08 UTC - in response to Message 99574. 5) Arm seems far more power efficient so CPU performance within an order of magnitude might reveal operational cost savings. We know, we know. But when you see the benchmarks of a Ryzen 5950X, it's difficult to resist.... ID: 99580 · Rating: 0 · rate: / Reply Quote

Melodie_Manthei_Family Send message Joined: 25 Jan 07 Posts: 15 Credit: 16,341,771 RAC: 0	Message 99725 - Posted: 21 Nov 2020, 3:46:44 UTC - in response to Message 99580. 5) Arm seems far more power efficient so CPU performance within an order of magnitude might reveal operational cost savings. We know, we know. But when you see the benchmarks of a Ryzen 5950X, it's difficult to resist.... Oh yeah for sure. I have a 3800X system as my main desktop and the thought crossed my mind...upgrade to 5000 series and use my displaced 3800X to build up a Rosetta machine. I've got some spare parts around (case, smaller SSDs, PSU) so I'd just need ram and motherboard...no-brainer right? ID: 99725 · Rating: 0 · rate: / Reply Quote

Melodie_Manthei_Family Send message Joined: 25 Jan 07 Posts: 15 Credit: 16,341,771 RAC: 0	Message 99726 - Posted: 21 Nov 2020, 4:24:31 UTC - in response to Message 99725. Last modified: 21 Nov 2020, 4:30:23 UTC One of the things in the ARM's favor is the price. For say $100 USD you can get an 8GB Pi4 up and running compared to the cost of a traditional x64 machine (even though they have higher core counts) and that is before you've factored in the electricity usage. I run a small cluster of Pi's, mostly Pi3's which aren't suitable for Rosetta but I do have 3 x Pi4 8GB which are. What would make things easier for people to do a small scale farm would be: 1. An off the shelf PSU than can handle 4 or 5 Pi at full load and has sockets or is cabled up ready to go. 2. Cases designed to hold multiple Pi's with cooling in mind. I addressed point 2 by designing a case that holds 4 x Pi side by side and has fans on top. I get them 3D printed. C4labs also make a couple of cases that would work. Those are great points, thanks for bringing it up. I've considered experimenting with 3D prints but haven't gotten that far yet. I thought I'd revisit if my PCB idea works out so I have an idea of form factor. I have a generic old 5V 40A Meanwell power supply I picked up pretty cheap online. I kind of take it for granted that I can just cut, splice, and wire things up to use it...you're right some support around using such a power supply would be good. 10 devices per power supply isn't too shabby and it's going to be more efficient and less to deal with compared to wall warts. I spent some time this evening looking for a nice Passmark-like CPU benchmark list for ARM CPU's but didn't find much. Geekbench is about all I can turn up, but a search for processor:arm yields an unmanageable 3 million results and it's more phone/device focused not purely the CPUs. Something closer to Passmark would be awesome. I finished a really sketchy alpha version of my PCB and sent it for fabrication. Mostly it's to test two things: how "solderable" these compute module connectors are with that tiny pin pitch, and if my pinout is even close to correct. Differential pair trace length and impedance requirements are new territory for me and pretty much over my head. I gave it my best shot with the amateur tools and techniques I have. I went for broke and included power, CPU activity, Ethernet speed, and Ethernet activity LEDs as well as an on-off toggle switch (no logic, just breaks 5V rail) but don't expect any of that to work right. If the thing boots without letting the smoke out of the CM4 that will be a big win for me and enough encouragement to motivate me to sort out Ethernet issues which will undoubtedly be present. Unfortunately I abandoned PoE for several reasons but I don't think it's so bad to separate data and power. This PCB is only for one compute module and the only port I physically surfaced is Ethernet...I will have to pre-load the OS onto the eMMC with a dev board, ensure SSH is configured, then move it to my board. If it works, the next challenge would be to re-layout everything to optimize for multiple CM4's. While redoing the layout, I also need to not break anything that functions on the alpha board like the Ethernet pairs. That's where this has to go to make sense. If the CM4+thisPCB costs more than an equivalent RPi4, then there's nearly zero benefit to this CM4 route. But, if buying one PCB means you can fire up 4-6 CM4's from that one board the cost gets split up and starts to make more sense. At the same time, the board layout needs to be conducive to cooling and packaging. In particular, I'd like all Ethernet ports, LEDs, and switches on one edge together. Maybe power, but there might be advantages to keeping that on the backside. I'd love to get to a point where I can offer a detailed and accurate PXE booting guide that can be used to boot a handful of CM4 devices without eMMC or SD cards. On a somewhat related note, I've let my Jetson and Pis run for a few weeks now and I'm seeing some trends. The 8GB Pi did catch up and surpass the 4GB pi in RAC, but I have yet to look closely at whether this really is because of the ram or if it's just normal 'noise' in task credits and timing. The Jetson, on the other hand, is interesting. It's sitting at around 550 RAC but there's some days of >1000 credits. I'm running the nVidia default OS with a gui and boinc is actually pausing tasks "waiting for memory" (2GB version). I think that's pretty impressive and when I find the time I'd like to switch it over to a command line OS to see if I can free up enough memory for one more core to come active. At $55 for the 2GB version I have to say I'm pleasantly surprised by the Jetson Nano which so far seems to be earning around 75% of the RAC the RPi4-4GB can pull. It has a lot of ports I don't need and some GPU hardware that doesn't get used so there's features I'm paying for that don't benefit the project. But, it was very quick to setup and comes with a reasonable heatsink and power supply. Another week or two and I should have more progress to report. CM4 boards and IO dev board are backordered so probably no changes there, but maybe I can try the Jetson in CLI and report back. ID: 99726 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1932 Credit: 18,534,891 RAC: 0	Message 99729 - Posted: 21 Nov 2020, 6:52:30 UTC - in response to Message 99726. I spent some time this evening looking for a nice Passmark-like CPU benchmark list for ARM CPU's but didn't find much. Geekbench is about all I can turn up, but a search for processor:arm yields an unmanageable 3 million results and it's more phone/device focused not purely the CPUs. Something closer to Passmark would be awesome. You appear to be comfortable getting your hands dirty with the nitty gritty so to speak, so the Phoronix test suite might be worth consideration. Here's two reviews based on it's test results. Apple M1 ARM Performance With A 2020 Mac Mini AMD Ryzen 9 5900X + Ryzen 9 5950X Dominate On Linux Grant Darwin NT ID: 99729 · Rating: 0 · rate: / Reply Quote

PorkyPies Send message Joined: 6 Apr 20 Posts: 45 Credit: 1,650,779 RAC: 0	Message 99741 - Posted: 22 Nov 2020, 6:52:42 UTC Last modified: 22 Nov 2020, 7:00:40 UTC Your design sounds decidedly like the Turing Pi which holds 7 Raspberry Pi Compute Modules (up to CM 3+). There is a Turing Pi2 in the works that is going to use the CM 4, although they are dropping down to 4 modules per board. The board is mITX size. It also looks like the are using a standard PC power connector (the original could use it or a 12v barrel connector). See Turing Pi MarksRpiCluster ID: 99741 · Rating: 0 · rate: / Reply Quote

Melodie_Manthei_Family Send message Joined: 25 Jan 07 Posts: 15 Credit: 16,341,771 RAC: 0	Message 99793 - Posted: 27 Nov 2020, 21:14:33 UTC - in response to Message 99741. Thanks for the link. I had heard of Turing Pi but couldn't recall what it was exactly until I looked again. Cool product. In a sense it's similar, except my skills and what I'm willing to put in to this would never ever yield something as nice as their product. There's has a lot of features, provides a lot of physical ports, and is an actual cluster. What I'm going for is the cheapest possible way to get Ethernet and power to CM4's each as standalone devices but sharing a PCB...it's not an actual cluster. This path is rife with compromises, disadvantages, and more complex setup on the end user's part but it could still fulfill the needs of a distributed computing project and let people play with the idea of scaling up. Really, this little project of mine is probably best described as a very limited breakout board for the CM4 connector that exposes an Ethernet port and power connections. Limited use case but limited cost. ID: 99793 · Rating: 0 · rate: / Reply Quote

Melodie_Manthei_Family Send message Joined: 25 Jan 07 Posts: 15 Credit: 16,341,771 RAC: 0	Message 99794 - Posted: 27 Nov 2020, 21:15:26 UTC - in response to Message 99729. Last modified: 27 Nov 2020, 21:16:23 UTC You appear to be comfortable getting your hands dirty with the nitty gritty so to speak, so the Phoronix test suite might be worth consideration. Thanks for this, I will definitely take a look! ID: 99794 · Rating: 0 · rate: / Reply Quote

spRocket Send message Joined: 23 Mar 20 Posts: 22 Credit: 3,008,018 RAC: 0	Message 99878 - Posted: 3 Dec 2020, 2:29:08 UTC Last modified: 3 Dec 2020, 2:30:33 UTC I've certainly been impressed by the Pi 4. Seeing it get a RAC comparable to an admittedly heavily-throttled i7-640LM (ThinkPad X201 Tablet) kind of blows my mind. Still, it's nothing compared to my "old" Ryzen 7 1700. And, strictly speaking, you're going to want active cooling with a Pi 4, though a 400 has enough of a heat sink to use passive cooling. One potential cluster idea: something that would hold multiple Pi 4 Compute Modules. It might be kind of awkward with the new module design, though, unless you're going to use a pizza box form factor. ID: 99878 · Rating: 0 · rate: / Reply Quote