Ryzen improvment with Linux 4.15.0-29

Message boards : Number crunching : Ryzen improvment with Linux 4.15.0-29

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89383 - Posted: 7 Aug 2018, 11:48:51 UTC

Having had so much bad luck with my Ryzen 1700 on Rosetta under Linux, I thought I would mention the improvement after upgrading to the latest Linux kernel (4.15.0-29). I now get much more consistent output, and a low error rate.
https://boinc.bakerlab.org/rosetta/results.php?hostid=3432628&offset=0&show_names=0&state=4&appid=

In fact, it is now better than my Intel i7-3770 and i7-4770 machines, which still show very inconsistent output unless I leave at least 3 cores (4 cores is better) free. However, the Rosetta is now running on 4 cores of my Ryzen, while 11 cores are on WCG (all projects) and another core supports a GPU on Folding, so I don't have to leave any free.
%0
ID: 89383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 6,319
Message 89401 - Posted: 14 Aug 2018, 13:47:30 UTC - in response to Message 89383.  

Having had so much bad luck with my Ryzen 1700 on Rosetta under Linux, I thought I would mention the improvement after upgrading to the latest Linux kernel (4.15.0-29). I now get much more consistent output, and a low error rate.


I'd be curious to try it with a new Threadripper!!
ID: 89401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89402 - Posted: 14 Aug 2018, 14:45:10 UTC - in response to Message 89401.  
Last modified: 14 Aug 2018, 14:52:45 UTC

I have found that it still helps to limit it to running Rosetta on only 2 cores at a time for the most consistent output.
But that is different than having to reserve cores. I can still use all my other cores on the various WCG projects without ill effects. It is quite nice.

And you could probably run with even more cores for a small reduction in output. It depends on what you want. I am suspicious that it is similar to the MIP (Microbiome Immunity Project) on WCG, where you have to limit it to running only a few cores at a time for best output. That is based on Rosetta, so there are some similarities.

PS - I will be building a Ryzen 2700 in a couple of months, and will try Rosetta on 8 full cores rather than 16 virtual cores. It might work. Threadripper would be great to try.
ID: 89402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,508,322
RAC: 18,740
Message 89403 - Posted: 14 Aug 2018, 16:27:03 UTC - in response to Message 89402.  

I have found that it still helps to limit it to running Rosetta on only 2 cores at a time for the most consistent output.
But that is different than having to reserve cores. I can still use all my other cores on the various WCG projects without ill effects. It is quite nice.

And you could probably run with even more cores for a small reduction in output. It depends on what you want. I am suspicious that it is similar to the MIP (Microbiome Immunity Project) on WCG, where you have to limit it to running only a few cores at a time for best output. That is based on Rosetta, so there are some similarities.

PS - I will be building a Ryzen 2700 in a couple of months, and will try Rosetta on 8 full cores rather than 16 virtual cores. It might work. Threadripper would be great to try.



The name of the WCG MIP Linux binary on my machine is "wcgrid_mip1_rosetta_7.11_x86_64-pc-linux-gnu". Notice the "rosetta" in the name. 8-)

Rosetta, like other projects, has stripped symbols from the binary. I disassembled the binary and used absolute addresses to see Rosetta was doing while running. I ran Rosetta 4.07 on all cores and it did not even show any computation ... work being done. It seemed to spend a huge chunk of its time spinning on the availability of a "LOCK".

Maybe I should try the exercise again, but with increasing numbers of WUs to confirm my initial findings. The design of code that uses a "LOCK" is not that hard. The design of EFFICIENT code for performance is more tricky.
ID: 89403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89404 - Posted: 14 Aug 2018, 16:35:07 UTC - in response to Message 89403.  

Rosetta, like other projects, has stripped symbols from the binary. I disassembled the binary and used absolute addresses to see Rosetta was doing while running. I ran Rosetta 4.07 on all cores and it did not even show any computation ... work being done. It seemed to spend a huge chunk of its time spinning on the availability of a "LOCK".

Maybe I should try the exercise again, but with increasing numbers of WUs to confirm my initial findings. The design of code that uses a "LOCK" is not that hard. The design of EFFICIENT code for performance is more tricky.

I am not sure what a "LOCK" is for, but it does not sound promising. Maybe you can jog them into doing the right thing.

Another issue is that BOINC has a bad habit of pausing one Rosetta work unit to run another. It could help to disable "Leave application in memory" as you suggested; I will try it in a couple of weeks.
ID: 89404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89405 - Posted: 15 Aug 2018, 9:44:43 UTC

I should also mention that to keep the Rosettas from being suspended at all, I have increased the BOINC "Switch between applications every" to 1600 minutes (thanks to anniet on the BOINC forum).

It seems to be working fine thus far, and all the Rosettas that have once started are running now, with none paused. I think it will help the consistency too.
ID: 89405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89423 - Posted: 20 Aug 2018, 12:42:26 UTC - in response to Message 89405.  

Bad luck. My Ryzen machine is now stuck with Universe "long runners" on all the cores. Nothing is working. Even my GTX 1070 on Folding has low output due to the core that supports it being taken over by the bad BHSpin v2 work units. I will not be able to fix it for a few days.
ID: 89423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 21,508,322
RAC: 18,740
Message 89424 - Posted: 21 Aug 2018, 4:53:18 UTC - in response to Message 89423.  

Bad luck. My Ryzen machine is now stuck with Universe "long runners" on all the cores. Nothing is working. Even my GTX 1070 on Folding has low output due to the core that supports it being taken over by the bad BHSpin v2 work units. I will not be able to fix it for a few days.


Why won't you be able to fix it for a few days?

Seems like you can
- SUSPEND a couple of Universe long runner TASKS for awhile to let other stuff run. Suspending a TASK will prevent more work coming from that PROJECT.
- define the MAX_CONCURRENT TASKS in the app_config.xml file to limit the number of those tasks that can start.
ID: 89424 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89425 - Posted: 21 Aug 2018, 5:21:10 UTC - in response to Message 89424.  
Last modified: 21 Aug 2018, 5:59:53 UTC

Why won't you be able to fix it for a few days?

I am 800 miles from the machine. It was a gamble. I lost.

(The real solution is dumping Universe.)
ID: 89425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
PappaLitto

Send message
Joined: 14 Nov 17
Posts: 17
Credit: 28,092,012
RAC: 1,192
Message 89431 - Posted: 21 Aug 2018, 22:47:40 UTC
Last modified: 21 Aug 2018, 22:47:48 UTC

You should install teamviewer on your crunching machines, makes life a lot easier.
ID: 89431 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 6,319
Message 89438 - Posted: 22 Aug 2018, 8:05:04 UTC - in response to Message 89401.  

I'd be curious to try it with a new Threadripper!!


Uh, Threadripper it's better with linux than with Win10
But also better than Windows Server!!
Threadripper
ID: 89438 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 6,319
Message 89439 - Posted: 22 Aug 2018, 8:25:15 UTC - in response to Message 89438.  

I'd be curious to try it with a new Threadripper!!


Uh, Threadripper it's better with linux than with Win10
But also better than Windows Server!!
Threadripper


And also in some distro, this cpu is better than Xeon
DragonFlyBSD


The Threadripper 2990WX is a beast. It is at *least* 50% faster than both the quad socket opteron and the dual socket Xeon system I tested against. The primary limitation for the 2990WX is likely its 4 channels of DDR4 memory, and like all Zen and Zen+ CPUs, memory performance matters more than CPU frequency (and costs almost no power to pump up the performance). That said, it still blow away a dual-socket Xeon with 3x the number of memory channels. That is impressive!"

ID: 89439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 89489 - Posted: 5 Sep 2018, 15:20:30 UTC

After returning home and dumping all the bad work units, the Ryzen 1700 is almost back to normal, though the BOINC scheduler is still running too many Rosettas at the moment. But the credit output is good, and returning to a consistent state.
https://boinc.bakerlab.org/rosetta/results.php?hostid=3432628

However, I have gotten 3 errors on this machine (all of the PF type) out of 10 work units completed. That compares to no errors out of a total of 24 completed on my i7-3770 and i7-8700. So I think the Ryzen is still a little error-prone. When I last tried out the Ryzen a year ago, I found that turning off SMT in the BIOS and just running on full cores eliminated the errors. Maybe someone could try it (especially on the Threadrippers), but I will be taking this Ryzen off after completing this test. Good luck.
ID: 89489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Ryzen improvment with Linux 4.15.0-29



©2024 University of Washington
https://www.bakerlab.org