Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 85 · 86 · 87 · 88 · 89 · 90 · 91 . . . 309 · Next

AuthorMessage
Profile Garry Heather

Send message
Joined: 23 Nov 20
Posts: 10
Credit: 362,743
RAC: 0
Message 100914 - Posted: 31 Mar 2021, 2:00:06 UTC

It is interesting to read how this is affecting other people - my Pi 4 rig I mentioned previously had acquired a cache of 2 days worth of units but has since stopped downloading more due to the insufficient memory issue. I do rather hope that this is not going to become the new normal.
ID: 100914 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 398
Credit: 12,294,748
RAC: 6,222
Message 100915 - Posted: 31 Mar 2021, 3:00:34 UTC

Has anyone else noticed that since these problems with disk / memory space have been reported there have been a lot (maybe 50%) of 3 hour work units?
ID: 100915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100916 - Posted: 31 Mar 2021, 5:40:12 UTC - in response to Message 100912.  

I have a few workstations with similar parameters and they work perfectly fine for many years, so I don't think I need 16GB memory.
Why, do you think 640k RAM should be enough? 1MB? 4MB, And that was back in the days of single core systems. Now we have multiple core & thread systems, and each running application instance will require memory to support it.
Past memory limits were due to hardware & OS limitations. These days, most limitations are due to available finances, and whether or not the work being done requires the extra RAM or not. It's your choice whether or not you equip your system with the resources necessary for it to be fully utilised or not.



Moreover, Rosetta should be a project which is run in the background. So, I should not equip my computer to meet Rosetta requirements, but Rosetta should try to use my resources.
Rosetta does use your resources, and it does run in the back ground. If you want it to use all of your CPU resources at the same time, then it needs to have enough memory to do so.
If you don't have enough RAM, it's not a problem- other Tasks will stop running till there is enough RAM for them to run. All taking place in the background.
Grant
Darwin NT
ID: 100916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100917 - Posted: 31 Mar 2021, 5:50:08 UTC - in response to Message 100915.  
Last modified: 31 Mar 2021, 5:52:12 UTC

Has anyone else noticed that since these problems with disk / memory space have been reported there have been a lot (maybe 50%) of 3 hour work units?
Yep, although all the ones i had running were using less than 300MB of RAM each.

For reference-
I've got 6c/12 thread systems with 32GB of RAM & a 1TB SSD, i never suspend processing.
Usage limits	
                                 Use at most 100 % of the CPUs
                                 Use at most 100 % of CPU time

Disk
                             Use no more than 20 GB
                                Leave at least 2 GB free
                             Use no more than 60 % of total

Memory
         When computer is in use, use at most 95 %
     When computer is not in use, use at most 95 %
Leave non-GPU tasks in memory while suspended N	
                  Page/swap file: use at most 75 %


I've had no issues with insufficient disk space or memory.


EDIT- there was a batch of RB tasks that came out before those shorter running ones, and the RB Tasks often need 1GB+ each.
Grant
Darwin NT
ID: 100917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100922 - Posted: 31 Mar 2021, 9:20:31 UTC - in response to Message 100911.  
Last modified: 31 Mar 2021, 9:28:45 UTC

Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.
You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)…
ID: 100922 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100923 - Posted: 31 Mar 2021, 9:37:44 UTC - in response to Message 100922.  

Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.
You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)…
In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).
If we don't run out of work again over the next few days, we should see how things actually are by early next week.


What is odd is that these messages are occurring now, with Tasks that don't require much RAM at all (less than 300MB) compared to many of the previous Tasks (around 800MB). Every one of my current Tasks is using less than 300MB.
Grant
Darwin NT
ID: 100923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100924 - Posted: 31 Mar 2021, 9:49:30 UTC - in response to Message 100917.  
Last modified: 31 Mar 2021, 9:58:42 UTC

I've had no issues with insufficient disk space or memory.
This points to a misconfiguration of the new batch of work units, as it seems unlikely it would be the project’s intention to cut off a third of its capacity…

Look in client_state.xml for the rsc_memory_bound and rsc_disk_bound settings of the new work units: they used to be 1,800,000,000 each; to yield the errors people are reporting they must now be set to 7,000,000,000 and 9,000,000,000.

How big is your BOINC data directory now? Did the new batch need to download any unusually large files (such as a new protein database)? The issue I have is not so much disk space (though it will be a pain to have to repartition every machine) as download size, since I’m on a capped data plan.
ID: 100924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,380,064
RAC: 20,136
Message 100925 - Posted: 31 Mar 2021, 10:15:45 UTC - in response to Message 100924.  
Last modified: 31 Mar 2021, 10:16:27 UTC

How big is your BOINC data directory now?
Unchanged.
1.77GB on one system, 2.32GB on the other- the largest i've ever seen it was around 2.7GB when i had a larger cache.

So very much looking like some error with the memory/disk space requirements values for the newly generated Tasks.
Still odd that with my number of cores/threads and available system RAM i haven't had issues.
Grant
Darwin NT
ID: 100925 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100926 - Posted: 31 Mar 2021, 10:35:17 UTC - in response to Message 100925.  

Still odd that with my number of cores/threads and available system RAM i haven't had issues.
It must be the case that the server only considers a host’s total available RAM and disk space (not per core) in deciding whether a task is suitable.

So if a task tells the server it might need 6.6 GB of RAM, the server will never send it to any host with less (even if in practice it would not need anywhere near that much), but it will happily send you 24 of them because they can run (just maybe not all at the same time).

There can’t be many machines with >6 GB RAM per core…
ID: 100926 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 100928 - Posted: 31 Mar 2021, 11:34:54 UTC - in response to Message 100903.  
Last modified: 31 Mar 2021, 11:43:48 UTC

Without knowing the actual system requirements (I really mean the post dated info where ir actually ran ok) of the work being 'unsent' it's not possible to see whether it's a bug or true resource mismatch. This accords with the post above.
As larger duty memory iunits are being sent out - that points to the low memory bar, which is fair enough.
All these projects are demand driven by default but lack of admin response on this forum leads to many duplicated queries.
Asteroids had a recent similar server problem- but the project put out an update message explaining glitches- saving many queries.
F@H has its faults- but the forum does solve many tech issues up front.
ID: 100928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100931 - Posted: 31 Mar 2021, 12:35:10 UTC - in response to Message 100928.  

the post dated info where ir actually ran ok
For some examples, look at Grant’s recent valid tasks. There doesn’t seem to be anything unusual about their memory and disk usage.
ID: 100931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Richard Sun
Avatar

Send message
Joined: 19 Feb 21
Posts: 1
Credit: 76,814,436
RAC: 33,558
Message 100932 - Posted: 31 Mar 2021, 15:04:53 UTC - in response to Message 100931.  

All my Raspberry Pi's 3B+, 4B 4GB, and 4B 8GB all have lots of work today, if you don't have work, I suggest you go to the BOINC Manager, click on Projects, and click Update for Rosetta@home. I definitely had seen all the same things that others mentioned these past few days on not getting new workloads with not enough memory, etc. but today it's back to "normal".
ID: 100932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100933 - Posted: 31 Mar 2021, 15:34:11 UTC - in response to Message 100932.  

It’s largely luck of the draw. If you happen to contact the server when it has some ‘small’ tasks ready to send, you will get them. But if it only has ‘big’ ones ready to go at that moment, and your machine is outside the limits, you won’t. And in that case the client will back off (for longer and longer durations, up to 1½ days) before asking again.

Also there doesn’t appear to be any mechanism stopping ‘small’ tasks going to ‘big’ machines – so the more that happens the less likely it becomes that there is work available for others…
ID: 100933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rensie

Send message
Joined: 22 Jan 06
Posts: 3
Credit: 1,480,056
RAC: 0
Message 100935 - Posted: 31 Mar 2021, 16:13:36 UTC

Just to add a data point: Full time Rosetta, ID 6028556, 32gb of ram, I have 1.8gb free with 6 running tasks. About 5gb per task is a fairly heavy requirement.
ID: 100935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100936 - Posted: 31 Mar 2021, 16:21:10 UTC - in response to Message 100935.  

Are you certain it’s Rosetta using all 30 GB? None of your recently-completed tasks shows anything close to that kind of memory usage.
ID: 100936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rensie

Send message
Joined: 22 Jan 06
Posts: 3
Credit: 1,480,056
RAC: 0
Message 100937 - Posted: 31 Mar 2021, 17:35:49 UTC - in response to Message 100936.  

Yes. Debian 10 minimal, terminal only, rosetta is the only running program. Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each. This started after the work outage. I had the recent disk space issue and increased the allocation. Then new work units using lots of ram. My other 4core/8gb box also shows 3 processes for each task but memory usage is significantly lower.

To be clear, I don't really care as long as it works. I'm just concerned about the project losing lots of cpu time due to heavier requirements.
ID: 100937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 100938 - Posted: 31 Mar 2021, 17:54:32 UTC - in response to Message 100903.  

There’s a new batch of work units today that are marked as needing 6.6 GB of memory, so the server won’t even send them to hosts it knows have less than that. I’m not sure swap space is taken into account in that decision. You might want to consider a different project until Rosetta comes back with tasks suited to smaller machines..
I got a note like that on one of my rubbish old machines that just do Boinc in the garage (seriously who doesn't have 64GB?), but I didn't see any smaller ones downloaded as an alternative, although it could have processed them by then. I wonder if the server just waits until those big ones have been taken by someone before it can hand out the smaller ones further down the queue?
ID: 100938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 100939 - Posted: 31 Mar 2021, 17:55:53 UTC - in response to Message 100911.  

Not just memory: I’m now seeing
Rosetta needs 5472.67MB more disk space
(on an admittedly very small partition set aside for the BOINC data directory, but it’s been fine for the last year) – so it looks like the new batch is unusually resource-hungry…


Rosetta needs 7437.13MB more disk space.

Isn't that adorable? Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.

"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had.

Don't you just hate folk who put @ in a sentence?
ID: 100939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 100945 - Posted: 1 Apr 2021, 0:43:45 UTC - in response to Message 100931.  

Yes , the last unit gave this working memory in scale, that many will operate under- but it doesn't say what was requested on send out!
Peak working set size 632.91 MB
Peak swap size 610.79 MB
Peak disk usage 55.33 MB
ID: 100945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 100946 - Posted: 1 Apr 2021, 0:53:24 UTC - in response to Message 100939.  

"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had.


Yes, it seems coquettish- but always leaves the choice up to Madame.. :)
ID: 100946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 85 · 86 · 87 · 88 · 89 · 90 · 91 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org