Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 163 · 164 · 165 · 166 · 167 · 168 · 169 . . . 227 · Next

AuthorMessage
gbayler

Send message
Joined: 10 Apr 20
Posts: 14
Credit: 3,069,484
RAC: 0
Message 104400 - Posted: 22 Jan 2022, 18:09:09 UTC

For the Linux-users out there: I have written a Perl-script boinc_watchdog.pl that checks for "0 CPU"-tasks (tasks with a very low CPU utilization, that likely won't terminate) and whether there is at least one task executing. If it finds "0 CPU"-tasks, it aborts them, and if there is not a single task executing, it restarts the boinc-client. I run it every 30 minutes as a cron job; for me, it works quite well. I am perfectly aware that this doesn't solve the root cause of the current problems, this is merely a workaround. Still, I think it is an improvement in comparison to having to manually abort tasks or restart the PC every other day.

Here you can find it: https://github.com/gbayler/boinc_watchdog

Hope that it is useful for someone else too! :)

Günther
ID: 104400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104401 - Posted: 22 Jan 2022, 18:25:34 UTC - in response to Message 104383.  
Last modified: 22 Jan 2022, 18:27:59 UTC


I have 49 and change spread out over 4 slots.
Everything works as it should.
The new drive is 500 gigs and it will be dedicated to BOINC
So there is more than enough room for swap or whatever else BOINC wants to do.
Swap files are for poor people without enough RAM :-)

If you don't have matched pairs of RAM, things can slow down. Dual channel is a great benefit for some things but not others. Depends if they're accessing the memory a lot. I changed my Ryzen to dual channel to make my game faster. It didn't help, but half the Boinc projects sped up a lot.
ID: 104401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104402 - Posted: 22 Jan 2022, 18:30:46 UTC - in response to Message 104385.  
Last modified: 22 Jan 2022, 18:34:41 UTC

Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds.

You get a write cache only if you install one. PrimoCache is the only one that I know of for Windows, which I use to protect my SSDs.
AFAIK Windows has a write cache unless it's a removable drive. In fact I know it does, because I've copied a huge amount of files from an SSD to a rotary drive, and the rotary drive kept being accessed long after things looked like they'd copied. Here's a cite: https://www.tenforums.com/tutorials/21904-enable-disable-disk-write-caching-windows-10-a.html

Memtest really doesn't have much to do with stability. It is mainly for errors, which might cause crashes, but more likely failures in work units.
Memtest is everything to do with stability. Every single time someone has come to me with a crashing computer, I've found dodgy memory using Memtest.

With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums.
Not in my experience. Must be dodgy memory. I can find nothing on google suggesting 4 sticks causes problems.
ID: 104402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104403 - Posted: 22 Jan 2022, 18:37:24 UTC - in response to Message 104393.  

The write rates on the pythons are horrendous. I am getting well over 1 TB/day (almost 2 TB) when running 20 pythons, even with a huge 26 GB write cache. That is too much. I will do something else with this machine.
SSDs have a longer life than rotary drives nowadays, look up the expected writes allowed to your SSD model and see how long the Pythons would take to wear it out. And caching the writes won't help anyway, since they have to be done at some point.
ID: 104403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 859
Credit: 51,272,435
RAC: 4,890
Message 104404 - Posted: 22 Jan 2022, 18:50:00 UTC - in response to Message 104403.  

SSDs have a longer life than rotary drives nowadays, look up the expected writes allowed to your SSD model and see how long the Pythons would take to wear it out. And caching the writes won't help anyway, since they have to be done at some point.

You can find out the hard way about SSD lifetimes. They usually don't publish the figures now, probably because they have been going down as the chip geometries shrink.

The caching for science projects works differently than if you are copying a video file, which would all have to be transferred. But in a scientific algorithm, you usually read from a location, do a calculation, a then store the value back, either into the original location or a related one. Therefore, by storing the information in DRAM memory, most of the writes are done to the memory. You transfer to the SSD only the residual writes remaining at the end of the cache latency period.

In fact, if you made the cache latency (write-delay) long enough, you would never have to transfer any of the writes to the SSD.
That is effectively what a ramdisk does, but it requires a lot more memory. You would have to store the entire BOINC data folder.
ID: 104404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 182
Credit: 23,131,380
RAC: 507
Message 104405 - Posted: 22 Jan 2022, 18:55:35 UTC - in response to Message 104394.  
Last modified: 22 Jan 2022, 19:52:33 UTC

By the way, I used to just put projects with high write rates on a ramdisk, and have all the writes go to main memory.
That really solves the problem. But on the Ryzen 3900X with all the pythons, the BOINC data folder is 107 GB; too much.
I might be able to pull it off on a Ryzen 3600 though; 12 virtual cores might work.
But I think they really need to develop the pythons a bit and call back when they are ready.

Yes , some of the pythons need a kick in the compilers
Amazingly I have 31 python and 7 R4,2 tasks running ATM and I have been through them to clear out two 0 cpu dud work units , it is a pain having to that at least once a day
Rosetta is using 235GB of disk space though the most I have seen was 266GB
Ram use right now is 59GB total system use 71GB on `standby` and only 40MB `free` of 128GB fitted in 8 slots [crashes ?? wot crashes !! . . . . tic tic tic . . . BOOM :),
SSD write bombardment by pythons , following an idea by [Greg I think] I have put in a 500GB SATA SSD Samsung 870 evo [£58 on ebay new still sealed]
I will see how long it lasts , though I haven't installed the additional "Samsung Magician" apps yet to keep an eye on the write rate , trim, garbage clean up etc
installed only boinc on it , to speed up python work unit loading times . it looked like the fastest kid on the block in benchmarks at low price , there is faster stuff out there at a high cost
I did look at M2 NVME drives but getting them to work in win7 looks like a pain of magical incantations on the command line to load the drivers , win8.1 onwards has them in already [I checked MS forum]
OK time to post this drivel on the forum and see what happens :)
ID: 104405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 859
Credit: 51,272,435
RAC: 4,890
Message 104406 - Posted: 22 Jan 2022, 18:59:27 UTC - in response to Message 104405.  

Ram use right now is 59GB total system use 71GB on `standby` and only 40MB `free` of 128GB fitted in 8 slots [crashes ?? wot crashes !! . . . . tic tic tic . . . BOOM :),
SSD write bombardment by pythons , following an idea by [Grant I think] I have put in a 500GB SSD Samsung 870 evo [£58 on ebay new still sealed]
I will see how long it lasts , though I haven't installed the additional "Samsung Magician" apps yet to keep an eye on the write rate , trim, garbage clean up etc

Good. I was hoping that someone would do some real-world tests.
I don't want do them myself.
ID: 104406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 182
Credit: 23,131,380
RAC: 507
Message 104407 - Posted: 22 Jan 2022, 19:02:56 UTC - in response to Message 104401.  
Last modified: 22 Jan 2022, 19:11:51 UTC

Swap files are for poor people without enough RAM :-)

I waz tiepin while you waz postin :-),
ID: 104407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104408 - Posted: 22 Jan 2022, 19:16:28 UTC - in response to Message 104404.  
Last modified: 22 Jan 2022, 19:17:40 UTC

The caching for science projects works differently than if you are copying a video file, which would all have to be transferred. But in a scientific algorithm, you usually read from a location, do a calculation, a then store the value back, either into the original location or a related one. Therefore, by storing the information in DRAM memory, most of the writes are done to the memory. You transfer to the SSD only the residual writes remaining at the end of the cache latency period.

In fact, if you made the cache latency (write-delay) long enough, you would never have to transfer any of the writes to the SSD.
That is effectively what a ramdisk does, but it requires a lot more memory. You would have to store the entire BOINC data folder.
Modern SSDs take 3000 write cycles, pythons write about 2MB/s per task on a fast CPU, so if you have a 1TB SSD, that would last for 5 years even running 10 at once, by which time you'd want to buy a bigger one anyway. However hard disks hate moving heads back and forth and fall apart with that much random access.
ID: 104408 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104409 - Posted: 22 Jan 2022, 19:25:22 UTC - in response to Message 104405.  
Last modified: 22 Jan 2022, 19:27:16 UTC

Yes , some of the pythons need a kick in the compilers
Amazingly I have 31 python and 7 R4,2 tasks running ATM and I have been through them to clear out two 0 cpu dud work units , it is a pain having to that at least once a day
Rosetta is using 235GB of disk space though the most I have seen was 266GB
Ram use right now is 59GB total system use 71GB on `standby` and only 40MB `free` of 128GB fitted in 8 slots [crashes ?? wot crashes !! . . . . tic tic tic . . . BOOM :),
SSD write bombardment by pythons , following an idea by [Greg I think] I have put in a 500GB SSD Samsung 870 evo [£58 on ebay new still sealed]
I will see how long it lasts , though I haven't installed the additional "Samsung Magician" apps yet to keep an eye on the write rate , trim, garbage clean up etc
installed only boinc on it , to speed up python work unit loading times . it looked like the fastest kid on the block in benchmarks at low price , there is faster stuff out there at a high cost
I did look at M2 NVME drives but getting them to work in win7 looks like a pain of magical incantations on the command line to load the drivers , win8.1 onwards has them in already [I checked MS forum]
OK time to post this drivel on the forum and see what happens :)
Standby memory? Is that disk cache? In windows, all my RAM is always in use, but the disk cache just takes whatever is left. You can ignore that.
HWInfo (or many other free utilities) will show you the disk SMART data so you can see how much life is left. The drive reports % life left.
Why are you still on Windows 7? 10 was free. NVME is about 8 times faster. If your MB does have a slot for it, you can get cards to take them.
ID: 104409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 182
Credit: 23,131,380
RAC: 507
Message 104411 - Posted: 22 Jan 2022, 19:49:24 UTC - in response to Message 104409.  

Standby memory? Is that disk cache? In windows, all my RAM is always in use, but the disk cache just takes whatever is left. You can ignore that.
HWInfo (or many other free utilities) will show you the disk SMART data so you can see how much life is left. The drive reports % life left.
Why are you still on Windows 7? 10 was free. NVME is about 8 times faster. If your MB does have a slot for it, you can get cards to take them.

It probably is `Disk Cache`
In winders `Resource Monitor` in the `Memory` tab , an MS app built in to W7 etc if you can find it, I keep it `pinned` to the taskbar , it also does Disk , Network , CPU
It is using Win 7 Ultimate, I don't know if that version [ that the licence limit will work with twin CPUs sockets] was free , I know from win 8 onwards the max memory that is ok is now something like 192GB
unlike win 7 home that is limited to 16GB.
No M2 slot onboard , though have seen PCIe-M2 adapters are on Ebay
ID: 104411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 859
Credit: 51,272,435
RAC: 4,890
Message 104412 - Posted: 22 Jan 2022, 19:53:13 UTC - in response to Message 104408.  

Modern SSDs take 3000 write cycles, pythons write about 2MB/s per task on a fast CPU, so if you have a 1TB SSD, that would last for 5 years even running 10 at once, by which time you'd want to buy a bigger one anyway. However hard disks hate moving heads back and forth and fall apart with that much random access.

On my Ryzen 3900X, I was seeing the OS write 4 TB/day (for 20 work units), or 46 MB/sec if I have my numbers right.
So you can do it if you limit the number of tasks to only a few at a time. And I think that Linux writes a bit less than Windows from what I can see, though it has other problems.

You can make it work if you are careful.
ID: 104412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104413 - Posted: 22 Jan 2022, 20:15:34 UTC - in response to Message 104411.  

It is using Win 7 Ultimate, I don't know if that version [ that the licence limit will work with twin CPUs sockets] was free , I know from win 8 onwards the max memory that is ok is now something like 192GB
unlike win 7 home that is limited to 16GB.
No M2 slot onboard , though have seen PCIe-M2 adapters are on Ebay
I had my machines some on Win 7 home and some win 7 ultimate. They all got an upgrade to Win 10 home or win 10 pro for free. But I don't think they still do it, unless you fiddle with the settings and say you're disabled!
ID: 104413 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1355
Credit: 13,624,788
RAC: 0
Message 104414 - Posted: 22 Jan 2022, 20:16:30 UTC - in response to Message 104393.  

EDIT: The only thing I see is this.
https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10
That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons.
It's still system RAM, just a very small amount. Hence the need for 3rd party software if you want to make more use of your RAM for system caching.
Grant
Darwin NT
ID: 104414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104415 - Posted: 22 Jan 2022, 20:20:55 UTC - in response to Message 104412.  

Modern SSDs take 3000 write cycles, pythons write about 2MB/s per task on a fast CPU, so if you have a 1TB SSD, that would last for 5 years even running 10 at once, by which time you'd want to buy a bigger one anyway. However hard disks hate moving heads back and forth and fall apart with that much random access.

On my Ryzen 3900X, I was seeing the OS write 4 TB/day (for 20 work units), or 46 MB/sec if I have my numbers right.
That sounds right, if you multiply my numbers up I come to 3.5TB a day for 20 work units. I was just guessing an average from watching the task manager. It'll differ depending on CPU speed, I was watching an i5 8600K.

So you can do it if you limit the number of tasks to only a few at a time. And I think that Linux writes a bit less than Windows from what I can see, though it has other problems.

You can make it work if you are careful.
I see an SSD as a consumable (like GPUs that wear out running 24/7). I get them dirt cheap second hand and expect to change them when they're too small or wear out. Most of my Boinc machines are on hard disks because I had loads kicking about not big enough for other uses. As they break I get SSDs. My main computer that gets thrashed all the time has 50% life left on the SSD, but it's an ancient model with old technology that didn't last so long, and it's getting too small anyway.
ID: 104415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104416 - Posted: 22 Jan 2022, 20:24:00 UTC - in response to Message 104414.  

EDIT: The only thing I see is this.
https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10
That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons.
It's still system RAM, just a very small amount. Hence the need for 3rd party software if you want to make more use of your RAM for system caching.
My main computer currently reads:
Memory in use: 22.1GB
Cache memory: 38.4GB
That's a big cache. Ok, reading up on it, you get 10% RAM disk write cache. So 6.4GB for me. Surely that's enough. Windows server will use 50% RAM.
ID: 104416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 859
Credit: 51,272,435
RAC: 4,890
Message 104417 - Posted: 22 Jan 2022, 20:24:26 UTC - in response to Message 104414.  

EDIT: The only thing I see is this.
https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10
That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons.
It's still system RAM, just a very small amount. Hence the need for 3rd party software if you want to make more use of your RAM for system caching.

Right. That is the point. You need a lot more. But I think .clair. mentioned Samsung Magician. I have used it when I was on Windows, and it includes around a GB, or maybe less, but could be enough to save an SSD if you did not run too many work units.
ID: 104417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1355
Credit: 13,624,788
RAC: 0
Message 104418 - Posted: 22 Jan 2022, 20:25:40 UTC - in response to Message 104394.  

But I think they really need to develop the pythons a bit and call back when they are ready.
To be honest, i would classify the present Python work as being at Alpha test level of development- they are still not even good enough for Beta testing. They are no where near being ready for actual deployment IMHO.
Excessive system requirements. Errors that result in systems being black listed from getting work- but not even advising those systems of what has happened, let along informing them of what they need to do to get work again. And worst of all- so many Tasks that just don't process & sit there taking up disk & RAM, blocking possibly OK Tasks from being downloaded & worked on requiring manual intervention to remove them. Not to mention the manual intervention often needed to clean up the VirtualBox VM environments.

Yep, Alpha software- not yet remotely ready for live deployment.
Grant
Darwin NT
ID: 104418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 859
Credit: 51,272,435
RAC: 4,890
Message 104420 - Posted: 22 Jan 2022, 20:29:39 UTC - in response to Message 104418.  

I think you have accurately portrayed it.
ID: 104420 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1247
Credit: 5,695,274
RAC: 2,032
Message 104421 - Posted: 22 Jan 2022, 20:37:12 UTC - in response to Message 104417.  

EDIT: The only thing I see is this.
https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10
That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons.
It's still system RAM, just a very small amount. Hence the need for 3rd party software if you want to make more use of your RAM for system caching.

Right. That is the point. You need a lot more. But I think .clair. mentioned Samsung Magician. I have used it when I was on Windows, and it includes around a GB, or maybe less, but could be enough to save an SSD if you did not run too many work units.
My Windows 10 is using 10% of my RAM = 6.5GB for a write cache. And if you tick the box to turn off write cache buffer flushing, it helps even more. Right click the drive, properties, hardware, properties, change settings, policies, tick "turn off buffer flushing".
ID: 104421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 163 · 164 · 165 · 166 · 167 · 168 · 169 . . . 227 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org