Problems and Technical Issues with Rosetta@home

Author	Message
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0	Message 100914 - Posted: 31 Mar 2021, 2:00:06 UTC It is interesting to read how this is affecting other people - my Pi 4 rig I mentioned previously had acquired a cache of 2 days worth of units but has since stopped downloading more due to the insufficient memory issue. I do rather hope that this is not going to become the new normal. ID: 100914 · Rating: 0 · rate: / Reply Quote

Bryn Mawr Send message Joined: 26 Dec 18 Posts: 442 Credit: 15,697,820 RAC: 2	Message 100915 - Posted: 31 Mar 2021, 3:00:34 UTC Has anyone else noticed that since these problems with disk / memory space have been reported there have been a lot (maybe 50%) of 3 hour work units? ID: 100915 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1940 Credit: 18,534,891 RAC: 0	Message 100916 - Posted: 31 Mar 2021, 5:40:12 UTC - in response to Message 100912. I have a few workstations with similar parameters and they work perfectly fine for many years, so I don't think I need 16GB memory. Why, do you think 640k RAM should be enough? 1MB? 4MB, And that was back in the days of single core systems. Now we have multiple core & thread systems, and each running application instance will require memory to support it. Past memory limits were due to hardware & OS limitations. These days, most limitations are due to available finances, and whether or not the work being done requires the extra RAM or not. It's your choice whether or not you equip your system with the resources necessary for it to be fully utilised or not. Moreover, Rosetta should be a project which is run in the background. So, I should not equip my computer to meet Rosetta requirements, but Rosetta should try to use my resources. Rosetta does use your resources, and it does run in the back ground. If you want it to use all of your CPU resources at the same time, then it needs to have enough memory to do so. If you don't have enough RAM, it's not a problem- other Tasks will stop running till there is enough RAM for them to run. All taking place in the background. Grant Darwin NT ID: 100916 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1940 Credit: 18,534,891 RAC: 0	Message 100917 - Posted: 31 Mar 2021, 5:50:08 UTC - in response to Message 100915. Last modified: 31 Mar 2021, 5:52:12 UTC Has anyone else noticed that since these problems with disk / memory space have been reported there have been a lot (maybe 50%) of 3 hour work units? Yep, although all the ones i had running were using less than 300MB of RAM each. For reference- I've got 6c/12 thread systems with 32GB of RAM & a 1TB SSD, i never suspend processing. Usage limits Use at most 100 % of the CPUs Use at most 100 % of CPU time Disk Use no more than 20 GB Leave at least 2 GB free Use no more than 60 % of total Memory When computer is in use, use at most 95 % When computer is not in use, use at most 95 % Leave non-GPU tasks in memory while suspended N Page/swap file: use at most 75 % I've had no issues with insufficient disk space or memory. EDIT- there was a batch of RB tasks that came out before those shorter running ones, and the RB Tasks often need 1GB+ each. Grant Darwin NT ID: 100917 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100922 - Posted: 31 Mar 2021, 9:20:31 UTC - in response to Message 100911. Last modified: 31 Mar 2021, 9:28:45 UTC Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed. You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… ID: 100922 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1940 Credit: 18,534,891 RAC: 0	Message 100923 - Posted: 31 Mar 2021, 9:37:44 UTC - in response to Message 100922. Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed. You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time). If we don't run out of work again over the next few days, we should see how things actually are by early next week. What is odd is that these messages are occurring now, with Tasks that don't require much RAM at all (less than 300MB) compared to many of the previous Tasks (around 800MB). Every one of my current Tasks is using less than 300MB. Grant Darwin NT ID: 100923 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100924 - Posted: 31 Mar 2021, 9:49:30 UTC - in response to Message 100917. Last modified: 31 Mar 2021, 9:58:42 UTC I've had no issues with insufficient disk space or memory. This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity… Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000. How big is your BOINC data directory now? Did the new batch need to download any unusually large files (such as a new protein database)? The issue I have is not so much disk space (though it will be a pain to have to repartition every machine) as download size, since I’m on a capped data plan. ID: 100924 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1940 Credit: 18,534,891 RAC: 0	Message 100925 - Posted: 31 Mar 2021, 10:15:45 UTC - in response to Message 100924. Last modified: 31 Mar 2021, 10:16:27 UTC How big is your BOINC data directory now? Unchanged. 1.77GB on one system, 2.32GB on the other- the largest i've ever seen it was around 2.7GB when i had a larger cache. So very much looking like some error with the memory/disk space requirements values for the newly generated Tasks. Still odd that with my number of cores/threads and available system RAM i haven't had issues. Grant Darwin NT ID: 100925 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100926 - Posted: 31 Mar 2021, 10:35:17 UTC - in response to Message 100925. Still odd that with my number of cores/threads and available system RAM i haven't had issues. It must be the case that the server only considers a host’s total available RAM and disk space (not per core) in deciding whether a task is suitable. So if a task tells the server it might need 6.6 GB of RAM, the server will never send it to any host with less (even if in practice it would not need anywhere near that much), but it will happily send you 24 of them because they can run (just maybe not all at the same time). There can’t be many machines with >6 GB RAM per core… ID: 100926 · Rating: 0 · rate: / Reply Quote

trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0	Message 100928 - Posted: 31 Mar 2021, 11:34:54 UTC - in response to Message 100903. Last modified: 31 Mar 2021, 11:43:48 UTC Without knowing the actual system requirements (I really mean the post dated info where ir actually ran ok) of the work being 'unsent' it's not possible to see whether it's a bug or true resource mismatch. This accords with the post above. As larger duty memory iunits are being sent out - that points to the low memory bar, which is fair enough. All these projects are demand driven by default but lack of admin response on this forum leads to many duplicated queries. Asteroids had a recent similar server problem- but the project put out an update message explaining glitches- saving many queries. F@H has its faults- but the forum does solve many tech issues up front. ID: 100928 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100931 - Posted: 31 Mar 2021, 12:35:10 UTC - in response to Message 100928. the post dated info where ir actually ran ok For some examples, look at Grant’s recent valid tasks. There doesn’t seem to be anything unusual about their memory and disk usage. ID: 100931 · Rating: 0 · rate: / Reply Quote

Richard Sun Send message Joined: 19 Feb 21 Posts: 1 Credit: 97,269,020 RAC: 2	Message 100932 - Posted: 31 Mar 2021, 15:04:53 UTC - in response to Message 100931. All my Raspberry Pi's 3B+, 4B 4GB, and 4B 8GB all have lots of work today, if you don't have work, I suggest you go to the BOINC Manager, click on Projects, and click Update for Rosetta@home. I definitely had seen all the same things that others mentioned these past few days on not getting new workloads with not enough memory, etc. but today it's back to "normal". ID: 100932 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100933 - Posted: 31 Mar 2021, 15:34:11 UTC - in response to Message 100932. It’s largely luck of the draw. If you happen to contact the server when it has some ‘small’ tasks ready to send, you will get them. But if it only has ‘big’ ones ready to go at that moment, and your machine is outside the limits, you won’t. And in that case the client will back off (for longer and longer durations, up to 1½ days) before asking again. Also there doesn’t appear to be any mechanism stopping ‘small’ tasks going to ‘big’ machines – so the more that happens the less likely it becomes that there is work available for others… ID: 100933 · Rating: 0 · rate: / Reply Quote

rensie Send message Joined: 22 Jan 06 Posts: 3 Credit: 1,480,056 RAC: 0	Message 100935 - Posted: 31 Mar 2021, 16:13:36 UTC Just to add a data point: Full time Rosetta, ID 6028556, 32gb of ram, I have 1.8gb free with 6 running tasks. About 5gb per task is a fairly heavy requirement. ID: 100935 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 100936 - Posted: 31 Mar 2021, 16:21:10 UTC - in response to Message 100935. Are you certain it’s Rosetta using all 30 GB? None of your recently-completed tasks shows anything close to that kind of memory usage. ID: 100936 · Rating: 0 · rate: / Reply Quote

rensie Send message Joined: 22 Jan 06 Posts: 3 Credit: 1,480,056 RAC: 0	Message 100937 - Posted: 31 Mar 2021, 17:35:49 UTC - in response to Message 100936. Yes. Debian 10 minimal, terminal only, rosetta is the only running program. Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each. This started after the work outage. I had the recent disk space issue and increased the allocation. Then new work units using lots of ram. My other 4core/8gb box also shows 3 processes for each task but memory usage is significantly lower. To be clear, I don't really care as long as it works. I'm just concerned about the project losing lots of cpu time due to heavier requirements. ID: 100937 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1603 Credit: 13,015,132 RAC: 0	Message 100938 - Posted: 31 Mar 2021, 17:54:32 UTC - in response to Message 100903. There’s a new batch of work units today that are marked as needing 6.6 GB of memory, so the server won’t even send them to hosts it knows have less than that. I’m not sure swap space is taken into account in that decision. You might want to consider a different project until Rosetta comes back with tasks suited to smaller machines.. I got a note like that on one of my rubbish old machines that just do Boinc in the garage (seriously who doesn't have 64GB?), but I didn't see any smaller ones downloaded as an alternative, although it could have processed them by then. I wonder if the server just waits until those big ones have been taken by someone before it can hand out the smaller ones further down the queue? ID: 100938 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1603 Credit: 13,015,132 RAC: 0	Message 100939 - Posted: 31 Mar 2021, 17:55:53 UTC - in response to Message 100911. Not just memory: I’m now seeing Rosetta needs 5472.67MB more disk space (on an admittedly very small partition set aside for the BOINC data directory, but it’s been fine for the last year) – so it looks like the new batch is unusually resource-hungry… Rosetta needs 7437.13MB more disk space. Isn't that adorable? Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed. "Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had. Don't you just hate folk who put @ in a sentence? ID: 100939 · Rating: 0 · rate: / Reply Quote

trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0	Message 100945 - Posted: 1 Apr 2021, 0:43:45 UTC - in response to Message 100931. Yes , the last unit gave this working memory in scale, that many will operate under- but it doesn't say what was requested on send out! Peak working set size 632.91 MB Peak swap size 610.79 MB Peak disk usage 55.33 MB ID: 100945 · Rating: 0 · rate: / Reply Quote

trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0	Message 100946 - Posted: 1 Apr 2021, 0:53:24 UTC - in response to Message 100939. "Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had. Yes, it seems coquettish- but always leaves the choice up to Madame.. :) ID: 100946 · Rating: 0 · rate: / Reply Quote