Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 86 · 87 · 88 · 89 · 90 · 91 · 92 . . . 274 · Next

AuthorMessage
rensie

Send message
Joined: 22 Jan 06
Posts: 3
Credit: 1,480,056
RAC: 0
Message 100935 - Posted: 31 Mar 2021, 16:13:36 UTC

Just to add a data point: Full time Rosetta, ID 6028556, 32gb of ram, I have 1.8gb free with 6 running tasks. About 5gb per task is a fairly heavy requirement.
ID: 100935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 100936 - Posted: 31 Mar 2021, 16:21:10 UTC - in response to Message 100935.  

Are you certain it’s Rosetta using all 30 GB? None of your recently-completed tasks shows anything close to that kind of memory usage.
ID: 100936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rensie

Send message
Joined: 22 Jan 06
Posts: 3
Credit: 1,480,056
RAC: 0
Message 100937 - Posted: 31 Mar 2021, 17:35:49 UTC - in response to Message 100936.  

Yes. Debian 10 minimal, terminal only, rosetta is the only running program. Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each. This started after the work outage. I had the recent disk space issue and increased the allocation. Then new work units using lots of ram. My other 4core/8gb box also shows 3 processes for each task but memory usage is significantly lower.

To be clear, I don't really care as long as it works. I'm just concerned about the project losing lots of cpu time due to heavier requirements.
ID: 100937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,530,219
RAC: 88
Message 100938 - Posted: 31 Mar 2021, 17:54:32 UTC - in response to Message 100903.  

There’s a new batch of work units today that are marked as needing 6.6 GB of memory, so the server won’t even send them to hosts it knows have less than that. I’m not sure swap space is taken into account in that decision. You might want to consider a different project until Rosetta comes back with tasks suited to smaller machines..
I got a note like that on one of my rubbish old machines that just do Boinc in the garage (seriously who doesn't have 64GB?), but I didn't see any smaller ones downloaded as an alternative, although it could have processed them by then. I wonder if the server just waits until those big ones have been taken by someone before it can hand out the smaller ones further down the queue?
ID: 100938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,530,219
RAC: 88
Message 100939 - Posted: 31 Mar 2021, 17:55:53 UTC - in response to Message 100911.  

Not just memory: I’m now seeing
Rosetta needs 5472.67MB more disk space
(on an admittedly very small partition set aside for the BOINC data directory, but it’s been fine for the last year) – so it looks like the new batch is unusually resource-hungry…


Rosetta needs 7437.13MB more disk space.

Isn't that adorable? Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.

"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had.

Don't you just hate folk who put @ in a sentence?
ID: 100939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 100945 - Posted: 1 Apr 2021, 0:43:45 UTC - in response to Message 100931.  

Yes , the last unit gave this working memory in scale, that many will operate under- but it doesn't say what was requested on send out!
Peak working set size 632.91 MB
Peak swap size 610.79 MB
Peak disk usage 55.33 MB
ID: 100945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 100946 - Posted: 1 Apr 2021, 0:53:24 UTC - in response to Message 100939.  

"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had.


Yes, it seems coquettish- but always leaves the choice up to Madame.. :)
ID: 100946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100948 - Posted: 1 Apr 2021, 1:30:22 UTC - in response to Message 100911.  

"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had

Then you have lived a charmed life, if my experience is anything to go by (usually not...)
ID: 100948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100949 - Posted: 1 Apr 2021, 1:40:17 UTC - in response to Message 100913.  

Moreover, Rosetta should be a project which is run in the background. So, I should not equip my computer to meet Rosetta requirements, but Rosetta should try to use my resources.

I hear you, but all programs specify the minimum hardware requirements. My complaint would be that those requirements for @Rosetta seem to change without notice.

On the other hand, it seems like they don't have enough work to consistently utilize the hardware available to them. If that's the case, then it makes no sense to spend time fine-tuning the program so that even more capacity is available.

It would be nice if the kind of ground-breaking work done at Rosetta could be done on a basic device, but ground-breaking work kind-of rarely works like that.
We all have to understand we need to make a certain commitment to this project and if parameters change (meaning increase) then we have to see if we can fall in line with that if we have it to spare.
It only need be a minimal change to our background settings, taking no more than 30 seconds, maybe once a year. If we have it available, that's no trouble at all.

If people decide that they can't continue their journey here for the sake of that change, so be it.
The project knows what it requires (as long as someone hasn't cocked up on a certain batch of tasks) and if their req'ts change, they must know there'll be some hosts drop-out as result.
It doesn't change their need - nor should it.
ID: 100949 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100950 - Posted: 1 Apr 2021, 1:50:16 UTC - in response to Message 100914.  

It is interesting to read how this is affecting other people - my Pi 4 rig I mentioned previously had acquired a cache of 2 days worth of units but has since stopped downloading more due to the insufficient memory issue. I do rather hope that this is not going to become the new normal.

At this project, with a 3-day deadline and default 8hr runtimes, calling down any more than 2 days of task cache is excessive.
I used to target 2-days including runtime (so about 1.6 days of tasks not running) but I, and a lot of people in the last year, have reduced it to 1 day or less.
We also set up back-up projects with a zero or minimal resource share for those occasions when all tasks are completed here.
That's good advice for anyone (some people recommend far less) so consider it the top-end.
One of the benefits is it very much reduces the resources you need to set aside for Rosetta, particularly when their minimum demands are increasing.
It's an equitable balance imo.
ID: 100950 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100951 - Posted: 1 Apr 2021, 2:10:36 UTC - in response to Message 100923.  
Last modified: 1 Apr 2021, 2:27:52 UTC

Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.
You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)…
In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).
If we don't run out of work again over the next few days, we should see how things actually are by early next week.

What is odd is that these messages are occurring now, with Tasks that don't require much RAM at all (less than 300MB) compared to many of the previous Tasks (around 800MB). Every one of my current Tasks is using less than 300MB.

It certainly is odd, and what I've observed matches what you've both said.
But I don't agree (if it's what you're saying, which it may not be) that people are voting with their feet, nor that it's hosts being slow to restore their caches.

Much more likely is that people who don't micromanage their systems are coming up against these higher DiskRAM req'ts and either not noticing or not knowing how to resolve it themselves (after all, we've struggled) and that's amounting to this one third reduction in hosts grabbing tasks.

Someone may notice the shortfall on the project side and tie it in with the whole DiskRAM thing, or they may be aware that the queued tasks are down below 1m again and think it's a localised issue and that'll resolve itself, so not actively do anything about it. I suspect we'll find out which within another day or so.

Edit: Sorry, I was going to add one more thing.
I have one remote PC on my team, which I know is running but something's gone wrong with its video output, so I can't change anything on my one-day-per-month visits to it during the UK lockdown (building work at the house it's in has clogged the whole room with dust).
What I've recently noticed is that it hasn't downloaded a Rosetta task for a week or so, but is pulling down lots of WCG tasks. I suspect it's hit this same wall on either Disk or RAM, even though it only runs 4 cores but has 16Gb RAM, which ought to be plenty of space on both counts.
This is just the kind of host I'm talking about above. Available for work, loads of space for work in theory, but can't pull any tasks down so running its back-up project 24/7
ID: 100951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100952 - Posted: 1 Apr 2021, 2:13:41 UTC - in response to Message 100926.  

Still odd that with my number of cores/threads and available system RAM i haven't had issues.
It must be the case that the server only considers a host’s total available RAM and disk space (not per core) in deciding whether a task is suitable.

So if a task tells the server it might need 6.6 GB of RAM, the server will never send it to any host with less (even if in practice it would not need anywhere near that much), but it will happily send you 24 of them because they can run (just maybe not all at the same time).

There can’t be many machines with >6 GB RAM per core…

You think?
How does that tie in with the fact that if the %age of Disk or RAM allocated to Boinc is changed, then it resolves the issue?
I may well be misunderstanding your point tbf
ID: 100952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,530,236
RAC: 0
Message 100953 - Posted: 1 Apr 2021, 2:15:34 UTC - in response to Message 100948.  


Then you have lived a charmed life, if my experience is anything to go by (usually not...)


It didn't start out so hot, but it's getting better year by year. I'm grateful for the life that I have now.

On another topic, running Linux Mint here, should I set up any firewall rules? Or does BOINC operate thru the open ports?
ID: 100953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,530,236
RAC: 0
Message 100954 - Posted: 1 Apr 2021, 2:17:52 UTC - in response to Message 100939.  


Don't you just hate folk who put @ in a sentence?

Like you just did? :^P
ID: 100954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 100955 - Posted: 1 Apr 2021, 2:35:04 UTC - in response to Message 100953.  

Then you have lived a charmed life, if my experience is anything to go by (usually not...)

It didn't start out so hot, but it's getting better year by year. I'm grateful for the life that I have now.

On another topic, running Linux Mint here, should I set up any firewall rules? Or does BOINC operate thru the open ports?

Good for you. Balance out that good karma with a little bit of negative karma here. You'll cope

I'm completely ignorant on Linux (though I used MinT on Ataris 20-30 years ago).
I'm not aware of Boinc needing anything out of the ordinary, but someone else will be along in a minute with the answer you need.
ID: 100955 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Garry Heather

Send message
Joined: 23 Nov 20
Posts: 10
Credit: 362,743
RAC: 0
Message 100956 - Posted: 1 Apr 2021, 5:08:26 UTC - in response to Message 100950.  
Last modified: 1 Apr 2021, 6:06:11 UTC

Regarding my 2 day cache, I do not consider that excessive given that my £35 SBC running off an SD card (OK, it's an SSD now but when I started it was an SD card) has a more reliable uptime than the servers running the project. In the time I've been doing work for Rosetta I have seen several periods of downtime where work units have not been deployed for days at a time. I don't mind having a Pi dedicated to the task so long as it's doing real work and not just heating the room.

I also get the idea of other projects and I looked into it, however my current setup is severely restricted in that department. I am currently using the Balena client image mostly out of convenience and I have not found another project compatible with that and my processor architecture. At the moment I haven't got the time to go digging around in Linux trying to make this work as I'm still very much learning. I know enough to be dangerous but even online guides tend to make assumptions about people's prior knowledge that are a considerable block to entry.
ID: 100956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Garry Heather

Send message
Joined: 23 Nov 20
Posts: 10
Credit: 362,743
RAC: 0
Message 100957 - Posted: 1 Apr 2021, 5:08:26 UTC - in response to Message 100950.  
Last modified: 1 Apr 2021, 5:09:57 UTC

Duplicate post deleted.
ID: 100957 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sophie

Send message
Joined: 13 Aug 19
Posts: 5
Credit: 1,203,866
RAC: 303
Message 100966 - Posted: 1 Apr 2021, 11:07:12 UTC

Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting.
Examples:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217359050
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217354682
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217342042

Is there some kind of underlying Problem? I didnt change anything on my system in the last few weeks and some wu are looking fine.

Sorry i created a seperat thread before reading the instruction to post in this thread.
ID: 100966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sophie

Send message
Joined: 13 Aug 19
Posts: 5
Credit: 1,203,866
RAC: 303
Message 100967 - Posted: 1 Apr 2021, 11:07:16 UTC

Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting.
Examples:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217359050
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217354682
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217342042

Is there some kind of underlying Problem? I didnt change anything on my system in the last few weeks and some wu are looking fine.

Sorry i created a seperat thread before reading the instruction to post in this thread.
ID: 100967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 350
Credit: 1,000,634
RAC: 0
Message 100968 - Posted: 1 Apr 2021, 11:09:38 UTC - in response to Message 100967.  

Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting.
Examples:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217359050
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217354682
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217342042

Is there some kind of underlying Problem? I didnt change anything on my system in the last few weeks and some wu are looking fine.

Sorry i created a seperat thread before reading the instruction to post in this thread.



My reply on your other thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14525&postid=100960
ID: 100968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 86 · 87 · 88 · 89 · 90 · 91 · 92 . . . 274 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org