Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 162 · 163 · 164 · 165 · 166 · 167 · 168 . . . 293 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 190 |
I have posted this many times before. They should make it a sticky if there were any moderator around to do it. Jim, I went back to 6.1 and I do not have problems. I can run all my projects there. Going back to 5.2 is a good place to start for trouble shooting Python VM problms, but this can affect other projects. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,231,553 RAC: 1,264 |
I have a Ryzen with 64GB, but it's my main computer. Less than that is pitiful by today's standards. It will take 128GB.Pythons for 12 hours? They average 2 hours here. I have two Boinc only machines with 36GB in them. I upped them just enough to run LHC. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have a Ryzen with 64GB, but it's my main computer. Less than that is pitiful by today's standards. It will take 128GB. Most projects don't take nearly as much as the pythons or LHC of course. I like memory, but beyond 64 GB you have stability problems, since you have to use all four slots. Sometimes it works, but you often have to juggle memory around. You may have to spend more than you anticipated. Two slots is a lot safer. |
Falconet Send message Joined: 9 Mar 09 Posts: 350 Credit: 1,105,396 RAC: 1 |
Robetta, as far as I can tell, is separate from Rosetta@home and is used mostly by researchers outside of the Baker Lab/IPD. It's an interface for users who wish to get computing power for their jobs. The queue we see in the Rosetta@home page represents the jobs that the IPD directly submits to run at Rosetta@home + whatever gets submitted at Rosetta to run on Rosetta@home, the rb_11111_11111 jobs. If I could see the Rosetta@home queue, it would likely be close to 100% Rosetta Python jobs. The Pythons are refilled from the queue up to a max of 5,000 ready to send. I don't know why (server resources constraints?) but it's not like Rosetta@home can do a lot of these at any given time so no point in increasing that value, So yeah, those 2.6 million jobs on the queue are Pythons. Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge". I think one of the features on Robetta is to make sure that everything is open - that is, any researcher can see what is being worked on both present and past and maybe avoid duplication of work. I recall during the pandemic that they asked Robetta users to make sure their jobs were visible to others so that everyone could benefit. (I have a suspicion that researchers can hide their jobs - often times, I try to search for the jobs I'm running on my computers using the ID number but Robetta doesn't return any results). I don't know who said that but my impression has always been that work that comes from Robetta is labelled rb and everything else that isn't labelled rb is directly submitted to Rosetta@home. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,231,553 RAC: 1,264 |
Most projects don't take nearly as much as the pythons or LHC of course.Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,231,553 RAC: 1,264 |
Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1611 Credit: 16,521,941 RAC: 3,372 |
Ok..so then where do they get the million something tasks in queue? Back when there were Rosetta 4.20 Tasks, all those millions were Rosetta 4.30 Tasks and if you checked the Total queued jobs number it would gradually run down to zero (or jump up again as new work was released). Now most of the work is Python, and that's what that number shows. Extremely occasionally it jumps up again when that extremely rare batch of Rosetta 4.20 work is released., However most of the time it sits around the 2-2.7 million mark, this is because the amount of Python work being done is being done at roughly the same rate as new Python work is released. The Unset value in the Application task list is the amount of work that's ready to go for that particular application (i think the ratio is 6:1 Rosetta 4.20:Python). The Tasks ready to send value under the Computing Status is the Rosetta 4.20 & Python Tasks by application values combined. The Total queued jobs value is both the Rosetta 4.20 work, and the Python work and both the Unset 4.20 & Python work all combined. It is the total of all types of work at all stages of yet to be processed. Grant Darwin NT |
Falconet Send message Joined: 9 Mar 09 Posts: 350 Credit: 1,105,396 RAC: 1 |
Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point. That's why it's "huge" |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1611 Credit: 16,521,941 RAC: 3,372 |
Yep.You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. Roughly 1 in 133 is being processed. Compared to Rosetta 4.20 at their peak (20 million queued up, 400k in progress) 1 in 50 being processed. And given the huge issues with Python Tasks, such as those that sit there not actually using any CPU time so they're not actually being processed, i'd suggest that 1 in 133 value in reality is way, way, waaaay worse than that. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,231,553 RAC: 1,264 |
You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.In the same way as this microwave is huge compared to that sofa, because I'm carrying it on my bicycle instead of the car. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 190 |
I have a Ryzen with 64GB, but it's my main computer. Less than that is pitiful by today's standards. It will take 128GB.Pythons for 12 hours? They average 2 hours here. Once I get my new drive installed this weekend, I should be able to undo the restriction I have right now on python and with the current memory, I should be able to run a few more pythons plus all my other projects or a full load of pythons (16) and have a little bit of memory left over. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 190 |
Yep.You are correct but these 2 million tasks will take a long time to finish at the current rate because only 15,000 or so are running at any given point.Sid Celery posted something a few months ago that he received from Admin or someone like that who said that the Python job that had been submitted by one of the IPD researchers was "huge".It's not that big, it's only a few million tasks. I've seen the queue at 15 million. But maybe that was several projects at once. Well you've seen the numbers. People come and try it out and leave. Others can't get it to work and leave. Without the staff taking notice or caring, it will be a downward to stable trend of systems instead of upward. But again, they don't care about numbers, just as long as the work gets done eventually. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 190 |
Most projects don't take nearly as much as the pythons or LHC of course.Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. I have 49 and change spread out over 4 slots. Everything works as it should. The new drive is 500 gigs and it will be dedicated to BOINC So there is more than enough room for swap or whatever else BOINC wants to do. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 190 |
Total queued jobs: 2,589,661 In progress: 53,882 Successes last 24h: 34,678 that's what the page says. Pretty small numbers against the 2 mill. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Everything works better with more memory, if you're not using it you get a massive disk cache. Using all four slots does not cause stability problems. Always test your new memory with memtest before use, even quality stuff has duds. You get a write cache only if you install one. PrimoCache is the only one that I know of for Windows, which I use to protect my SSDs. Memtest really doesn't have much to do with stability. It is mainly for errors, which might cause crashes, but more likely failures in work units. With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1611 Credit: 16,521,941 RAC: 3,372 |
You get a write cache only if you install one. PrimoCache is the only one that I know of for Windows, which I use to protect my SSDs.? The default setting for Windows is write caching enabled. If you want to set it's size (other than doing registry hacks), then you'd need a 3rd party one. With large amounts of memory, especially the two-sided memory modules, you will see many more crashes using four slots. Check the forums.I've had systems with only 2 slots used & memory problems. I've had systems with all slots used & no problems. While the more components, the greater the likely hood of failure, the biggest cause of issues with more than 2 modules is people pushing the RAM too hard. Yes, 2 modules allows you tighter timings and higher clocks. But as long as you use modules of the same brand & model, and don't push them beyond their rated clocks & timings, you won't have any issues. Look at server systems that may have 32 (or more) DIMM slots. Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The default setting for Windows is write caching enabled.That is just the cache on the disk drive itself, and is relatively small. These days, it is often just a faster section of the flash memory (e.g., two-level instead of four level or more). Therefore, it is subject to the same wearout mechanism, just a bit more slowly. But using a portion of main memory as the write cache is much faster, and will protect the SSD from the high level of writes, such as on the pythons. And it can be very large. I usually use at least 8 GB. I posted on it in another topic. I've had systems with only 2 slots used & memory problems. I've had systems with all slots used & no problems.I have had much more experience. And the larger the CPU, the worse the problems. With two Ryzen 3900X and two Ryzen 3950X, I have seen them all. It saved me some grief with the 5900 series. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1611 Credit: 16,521,941 RAC: 3,372 |
Every article i've seen about the WIn10 write caching says it is using system RAM to cache writes- it has nothing to do with the drive's own onboard buffering.The default setting for Windows is write caching enabled.That is just the cache on the disk drive itself, and is relatively small. These days, it is often just a faster section of the flash memory (e.g., two-level instead of four level or more). Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Every article i've seen about the WIn10 write caching says it is using system RAM to cache writes- it has nothing to do with the drive's own onboard buffering. I think you are confusing that with read caches, but I will look. If it were caching writes, you would probably know it. If the cached writes were save to disk, it would take a long time to shut down, for example. And the programs that show the writes to disk would indicate it. I don't see it. Read caches are easier to implement, but less necessary. They don't save the SSD from excessive writes. And the reads from SSDs are fast anyway, so the read caches are not all that necessary. EDIT: The only thing I see is this. https://www.windowscentral.com/how-manage-disk-write-caching-external-storage-windows-10 That is just disk write caching, as I previously discussed. It uses only a small amount of memory, not the GB that you need to protect the SSDs from the pythons. The write rates on the pythons are horrendous. I am getting well over 1 TB/day (almost 2 TB) when running 20 pythons, even with a huge 26 GB write cache. That is too much. I will do something else with this machine. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
By the way, I used to just put projects with high write rates on a ramdisk, and have all the writes go to main memory. That really solves the problem. But on the Ryzen 3900X with all the pythons, the BOINC data folder is 107 GB; too much. I might be able to pull it off on a Ryzen 3600 though; 12 virtual cores might work. But I think they really need to develop the pythons a bit and call back when they are ready. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org