Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 162 · 163 · 164 · 165 · 166 · 167 · 168 . . . 310 · Next
Author | Message |
---|---|
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 1,230 |
Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point. That's why it's "huge" |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1731 Credit: 18,495,101 RAC: 20,601 |
Yep.You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. Roughly 1 in 133 is being processed. Compared to Rosetta 4.20 at their peak (20 million queued up, 400k in progress) 1 in 50 being processed. And given the huge issues with Python Tasks, such as those that sit there not actually using any CPU time so they're not actually being processed, i'd suggest that 1 in 133 value in reality is way, way, waaaay worse than that. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.In the same way as this microwave is huge compared to that sofa, because I'm carrying it on my bicycle instead of the car. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have a Ryzen with 64GB, but it's my main computer. Less than that is pitiful by today's standards. It will take 128GB.Pythons for 12 hours? They average 2 hours here. Once I get my new drive installed this weekend, I should be able to undo the restriction I have right now on python and with the current memory, I should be able to run a few more pythons plus all my other projects or a full load of pythons (16) and have a little bit of memory left over. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Yep.You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. Well you've seen the numbers. People come and try it out and leave. Others can't get it to work and leave. Without the staff taking notice or caring, it will be a downward to stable trend of systems instead of upward. But again, they don't care about numbers, just as long as the work gets done eventually. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Most projects don't take nearly as much as the pythons or LHC of course.Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. I have 49 and change spread out over 4 slots. Everything works as it should. The new drive is 500 gigs and it will be dedicated to BOINC So there is more than enough room for swap or whatever else BOINC wants to do. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Total queued jobs: 2,589,661 In progress: 53,882 Successes last 24h: 34,678 that's what the page says. Pretty small numbers against the 2 mill. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. You get a write cache only if you install one. PrimoCache is the only one that I know of for Windows, which I use to protect my SSDs. Memtest really doesn't have much to do with stability. It is mainly for errors, which might cause crashes, but more likely failures in work units. With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1731 Credit: 18,495,101 RAC: 20,601 |
You get a write cache only if you install one. PrimoCache is the only one that I know of for Windows, which I use to protect my SSDs.? The default setting for Windows is write caching enabled. If you want to set it's size (other than doing registry hacks), then you'd need a 3rd party one. With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums.I've had systems with only 2 slots used & memory problems. I've had systems with all slots used & no problems. While the more components, the greater the likely hood of failure, the biggest cause of issues with more than 2 modules is people pushing the RAM too hard. Yes, 2 modules allows you tighter timings and higher clocks. But as long as you use modules of the same brand & model, and don't push them beyond their rated clocks & timings, you won't have any issues. Look at server systems that may have 32 (or more) DIMM slots. Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The default setting for Windows is write caching enabled.That is just the cache on the disk drive itself, and is relatively small. These days, it is often just a faster section of the flash memory (e.g., two-level instead of four level or more). Therefore, it is subject to the same wearout mechanism, just a bit more slowly. But using a portion of main memory as the write cache is much faster, and will protect the SSD from the high level of writes, such as on the pythons. And it can be very large. I usually use at least 8 GB. I posted on it in another topic. I've had systems with only 2 slots used & memory problems. I've had systems with all slots used & no problems.I have had much more experience. And the larger the CPU, the worse the problems. With two Ryzen 3900X and two Ryzen 3950X, I have seen them all. It saved me some grief with the 5900 series. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1731 Credit: 18,495,101 RAC: 20,601 |
Every article i've seen about the WIn10 write caching says it is using system RAM to cache writes- it has nothing to do with the drive's own onboard buffering.The default setting for Windows is write caching enabled.That is just the cache on the disk drive itself, and is relatively small. These days, it is often just a faster section of the flash memory (e.g., two-level instead of four level or more). Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Every article i've seen about the WIn10 write caching says it is using system RAM to cache writes- it has nothing to do with the drive's own onboard buffering. I think you are confusing that with read caches, but I will look. If it were caching writes, you would probably know it. If the cached writes were save to disk, it would take a long time to shut down, for example. And the programs that show the writes to disk would indicate it. I don't see it. Read caches are easier to implement, but less necessary. They don't save the SSD from excessive writes. And the reads from SSDs are fast anyway, so the read caches are not all that necessary. EDIT: The only thing I see is this. https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10 That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons. The write rates on the pythons are horrendous. I am getting well over 1 TB/day (almost 2 TB) when running 20 pythons, even with a huge 26 GB write cache. That is too much. I will do something else with this machine. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
By the way, I used to just put projects with high write rates on a ramdisk, and have all the writes go to main memory. That really solves the problem. But on the Ryzen 3900X with all the pythons, the BOINC data folder is 107 GB; too much. I might be able to pull it off on a Ryzen 3600 though; 12 virtual cores might work. But I think they really need to develop the pythons a bit and call back when they are ready. |
gbayler Send message Joined: 10 Apr 20 Posts: 14 Credit: 3,069,484 RAC: 0 |
For the Linux-users out there: I have written a Perl-script boinc_watchdog.pl that checks for "0 CPU"-tasks (tasks with a very low CPU utilization, that likely won't terminate) and whether there is at least one task executing. If it finds "0 CPU"-tasks, it aborts them, and if there is not a single task executing, it restarts the boinc-client. I run it every 30 minutes as a cron job; for me, it works quite well. I am perfectly aware that this doesn't solve the root cause of the current problems, this is merely a workaround. Still, I think it is an improvement in comparison to having to manually abort tasks or restart the PC every other day. Here you can find it: https://github.com/gbayler/boinc_watchdog Hope that it is useful for someone else too! :) Günther |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
Swap files are for poor people without enough RAM :-) If you don't have matched pairs of RAM, things can slow down. Dual channel is a great benefit for some things but not others. Depends if they're accessing the memory a lot. I changed my Ryzen to dual channel to make my game faster. It didn't help, but half the Boinc projects sped up a lot. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
AFAIK Windows has a write cache unless it's a removable drive. In fact I know it does, because I've copied a huge amount of files from an SSD to a rotary drive, and the rotary drive kept being accessed long after things looked like they'd copied. Here's a cite: https://www.tenforums.com/tutorials/21904-enable-disable-disk-write-caching-windows-10-a.htmlEverything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. Memtest really doesn't have much to do with stability. It is mainly for errors, which might cause crashes, but more likely failures in work units.Memtest is everything to do with stability. Every single time someone has come to me with a crashing computer, I've found dodgy memory using Memtest. With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums.Not in my experience. Must be dodgy memory. I can find nothing on google suggesting 4 sticks causes problems. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 6,010 |
The write rates on the pythons are horrendous. I am getting well over 1 TB/day (almost 2 TB) when running 20 pythons, even with a huge 26 GB write cache. That is too much. I will do something else with this machine.SSDs have a longer life than rotary drives nowadays, look up the expected writes allowed to your SSD model and see how long the Pythons would take to wear it out. And caching the writes won't help anyway, since they have to be done at some point. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
SSDs have a longer life than rotary drives nowadays, look up the expected writes allowed to your SSD model and see how long the Pythons would take to wear it out. And caching the writes won't help anyway, since they have to be done at some point. You can find out the hard way about SSD lifetimes. They usually don't publish the figures now, probably because they have been going down as the chip geometries shrink. The caching for science projects works differently than if you are copying a video file, which would all have to be transferred. But in a scientific algorithm, you usually read from a location, do a calculation, a then store the value back, either into the original location or a related one. Therefore, by storing the information in DRAM memory, most of the writes are done to the memory. You transfer to the SSD only the residual writes remaining at the end of the cache latency period. In fact, if you made the cache latency (write-delay) long enough, you would never have to transfer any of the writes to the SSD. That is effectively what a ramdisk does, but it requires a lot more memory. You would have to store the entire BOINC data folder. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
By the way, I used to just put projects with high write rates on a ramdisk, and have all the writes go to main memory. Yes , some of the pythons need a kick in the compilers Amazingly I have 31 python and 7 R4,2 tasks running ATM and I have been through them to clear out two 0 cpu dud work units , it is a pain having to that at least once a day Rosetta is using 235GB of disk space though the most I have seen was 266GB Ram use right now is 59GB total system use 71GB on `standby` and only 40MB `free` of 128GB fitted in 8 slots [crashes ?? wot crashes !! . . . . tic tic tic . . . BOOM :), SSD write bombardment by pythons , following an idea by [Greg I think] I have put in a 500GB SATA SSD Samsung 870 evo [£58 on ebay new still sealed] I will see how long it lasts , though I haven't installed the additional "Samsung Magician" apps yet to keep an eye on the write rate , trim, garbage clean up etc installed only boinc on it , to speed up python work unit loading times . it looked like the fastest kid on the block in benchmarks at low price , there is faster stuff out there at a high cost I did look at M2 NVME drives but getting them to work in win7 looks like a pain of magical incantations on the command line to load the drivers , win8.1 onwards has them in already [I checked MS forum] OK time to post this drivel on the forum and see what happens :) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Ram use right now is 59GB total system use 71GB on `standby` and only 40MB `free` of 128GB fitted in 8 slots [crashes ?? wot crashes !! . . . . tic tic tic . . . BOOM :), Good. I was hoping that someone would do some real-world tests. I don't want do them myself. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org