Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 86 · 87 · 88 · 89 · 90 · 91 · 92 . . . 306 · Next
Author | Message |
---|---|
rensie Send message Joined: 22 Jan 06 Posts: 3 Credit: 1,480,056 RAC: 0 |
Just to add a data point: Full time Rosetta, ID 6028556, 32gb of ram, I have 1.8gb free with 6 running tasks. About 5gb per task is a fairly heavy requirement. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Are you certain it’s Rosetta using all 30 GB? None of your recently-completed tasks shows anything close to that kind of memory usage. |
rensie Send message Joined: 22 Jan 06 Posts: 3 Credit: 1,480,056 RAC: 0 |
Yes. Debian 10 minimal, terminal only, rosetta is the only running program. Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each. This started after the work outage. I had the recent disk space issue and increased the allocation. Then new work units using lots of ram. My other 4core/8gb box also shows 3 processes for each task but memory usage is significantly lower. To be clear, I don't really care as long as it works. I'm just concerned about the project losing lots of cpu time due to heavier requirements. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,050,318 RAC: 15,849 |
There’s a new batch of work units today that are marked as needing 6.6 GB of memory, so the server won’t even send them to hosts it knows have less than that. I’m not sure swap space is taken into account in that decision. You might want to consider a different project until Rosetta comes back with tasks suited to smaller machines..I got a note like that on one of my rubbish old machines that just do Boinc in the garage (seriously who doesn't have 64GB?), but I didn't see any smaller ones downloaded as an alternative, although it could have processed them by then. I wonder if the server just waits until those big ones have been taken by someone before it can hand out the smaller ones further down the queue? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,050,318 RAC: 15,849 |
Not just memory: I’m now seeingRosetta needs 5472.67MB more disk space(on an admittedly very small partition set aside for the BOINC data directory, but it’s been fine for the last year) – so it looks like the new batch is unusually resource-hungry… Don't you just hate folk who put @ in a sentence? |
trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0 |
Yes , the last unit gave this working memory in scale, that many will operate under- but it doesn't say what was requested on send out! Peak working set size 632.91 MB Peak swap size 610.79 MB Peak disk usage 55.33 MB |
trevG Send message Joined: 5 Nov 13 Posts: 9 Credit: 687,475 RAC: 0 |
"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had. Yes, it seems coquettish- but always leaves the choice up to Madame.. :) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
"Makes use of unused COP cycles" Sounds so easy, doesn't it? I know that I have been here but a short time, but @Rosetta is higher maintenance than any turbulent girlfriend that I've ever had Then you have lived a charmed life, if my experience is anything to go by (usually not...) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Moreover, Rosetta should be a project which is run in the background. So, I should not equip my computer to meet Rosetta requirements, but Rosetta should try to use my resources. It would be nice if the kind of ground-breaking work done at Rosetta could be done on a basic device, but ground-breaking work kind-of rarely works like that. We all have to understand we need to make a certain commitment to this project and if parameters change (meaning increase) then we have to see if we can fall in line with that if we have it to spare. It only need be a minimal change to our background settings, taking no more than 30 seconds, maybe once a year. If we have it available, that's no trouble at all. If people decide that they can't continue their journey here for the sake of that change, so be it. The project knows what it requires (as long as someone hasn't cocked up on a certain batch of tasks) and if their req'ts change, they must know there'll be some hosts drop-out as result. It doesn't change their need - nor should it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
It is interesting to read how this is affecting other people - my Pi 4 rig I mentioned previously had acquired a cache of 2 days worth of units but has since stopped downloading more due to the insufficient memory issue. I do rather hope that this is not going to become the new normal. At this project, with a 3-day deadline and default 8hr runtimes, calling down any more than 2 days of task cache is excessive. I used to target 2-days including runtime (so about 1.6 days of tasks not running) but I, and a lot of people in the last year, have reduced it to 1 day or less. We also set up back-up projects with a zero or minimal resource share for those occasions when all tasks are completed here. That's good advice for anyone (some people recommend far less) so consider it the top-end. One of the benefits is it very much reduces the resources you need to set aside for Rosetta, particularly when their minimum demands are increasing. It's an equitable balance imo. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… It certainly is odd, and what I've observed matches what you've both said. But I don't agree (if it's what you're saying, which it may not be) that people are voting with their feet, nor that it's hosts being slow to restore their caches. Much more likely is that people who don't micromanage their systems are coming up against these higher DiskRAM req'ts and either not noticing or not knowing how to resolve it themselves (after all, we've struggled) and that's amounting to this one third reduction in hosts grabbing tasks. Someone may notice the shortfall on the project side and tie it in with the whole DiskRAM thing, or they may be aware that the queued tasks are down below 1m again and think it's a localised issue and that'll resolve itself, so not actively do anything about it. I suspect we'll find out which within another day or so. Edit: Sorry, I was going to add one more thing. I have one remote PC on my team, which I know is running but something's gone wrong with its video output, so I can't change anything on my one-day-per-month visits to it during the UK lockdown (building work at the house it's in has clogged the whole room with dust). What I've recently noticed is that it hasn't downloaded a Rosetta task for a week or so, but is pulling down lots of WCG tasks. I suspect it's hit this same wall on either Disk or RAM, even though it only runs 4 cores but has 16Gb RAM, which ought to be plenty of space on both counts. This is just the kind of host I'm talking about above. Available for work, loads of space for work in theory, but can't pull any tasks down so running its back-up project 24/7 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Still odd that with my number of cores/threads and available system RAM i haven't had issues.It must be the case that the server only considers a host’s total available RAM and disk space (not per core) in deciding whether a task is suitable. You think? How does that tie in with the fact that if the %age of Disk or RAM allocated to Boinc is changed, then it resolves the issue? I may well be misunderstanding your point tbf |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
It didn't start out so hot, but it's getting better year by year. I'm grateful for the life that I have now. On another topic, running Linux Mint here, should I set up any firewall rules? Or does BOINC operate thru the open ports? |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Like you just did? :^P |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Then you have lived a charmed life, if my experience is anything to go by (usually not...) Good for you. Balance out that good karma with a little bit of negative karma here. You'll cope I'm completely ignorant on Linux (though I used MinT on Ataris 20-30 years ago). I'm not aware of Boinc needing anything out of the ordinary, but someone else will be along in a minute with the answer you need. |
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0 |
Regarding my 2 day cache, I do not consider that excessive given that my £35 SBC running off an SD card (OK, it's an SSD now but when I started it was an SD card) has a more reliable uptime than the servers running the project. In the time I've been doing work for Rosetta I have seen several periods of downtime where work units have not been deployed for days at a time. I don't mind having a Pi dedicated to the task so long as it's doing real work and not just heating the room. I also get the idea of other projects and I looked into it, however my current setup is severely restricted in that department. I am currently using the Balena client image mostly out of convenience and I have not found another project compatible with that and my processor architecture. At the moment I haven't got the time to go digging around in Linux trying to make this work as I'm still very much learning. I know enough to be dangerous but even online guides tend to make assumptions about people's prior knowledge that are a considerable block to entry. |
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0 |
Duplicate post deleted. |
Sophie Send message Joined: 13 Aug 19 Posts: 5 Credit: 1,410,379 RAC: 834 |
Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting. Examples: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217359050 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217354682 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217342042 Is there some kind of underlying Problem? I didnt change anything on my system in the last few weeks and some wu are looking fine. Sorry i created a seperat thread before reading the instruction to post in this thread. |
Sophie Send message Joined: 13 Aug 19 Posts: 5 Credit: 1,410,379 RAC: 834 |
Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting. Examples: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217359050 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217354682 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217342042 Is there some kind of underlying Problem? I didnt change anything on my system in the last few weeks and some wu are looking fine. Sorry i created a seperat thread before reading the instruction to post in this thread. |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,252,725 RAC: 2,027 |
Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting. My reply on your other thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14525&postid=100960 |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org