Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 89 · 90 · 91 · 92 · 93 · 94 · 95 . . . 295 · Next
Author | Message |
---|---|
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
I'm...um, "flattered" that you think about that, but just for the record, I don't roll that way. Kindly limit yourself to policing my vernacular, dawg. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Who is the guilty partyWe’re here to help with scientific research, not to point fingers at the people doing it. Somebody made a mistake, and an experiment failed. It happens; that’s how people learn. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
There does seem to be a backoff when a task fails. With sched_op_debug selected in your event log options, you should see it logged as [sched_op] Deferring communication for … [sched_op] Reason: Unrecoverable error for task …But as that’s client-side I would have expected it to be reset if you manually Update a project. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 387 Credit: 11,777,758 RAC: 3,601 |
There does seem to be a backoff when a task fails. With sched_op_debug selected in your event log options, you should see it logged as[sched_op] Deferring communication for … [sched_op] Reason: Unrecoverable error for task …But as that’s client-side I would have expected it to be reset if you manually Update a project. As my system naturally runs one out, one in I was using manual update to try to get fresh tasks and it did not appear to be resetting the back off. |
Martin.Heinrich Send message Joined: 4 May 20 Posts: 1 Credit: 396,444 RAC: 0 |
I have Rosette for long time but now: Rosetta asks for too much RAM Rosetta@home: Notice from server Rosetta needs 6675.72 MB RAM but only 1343.33 MB is available for use. I can give it 3GB but 6.5GB is not ok. Why dont I simply get tasks with less RAM demand ? If this problem is not solved, then Rosetta will not get more work done by my computers. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Why dont I simply get tasks with less RAM demand ?There aren’t any available. Try again in a few days’ time; perhaps there will be some new smaller work units. |
jsm Send message Joined: 4 Apr 20 Posts: 3 Credit: 74,564,295 RAC: 46,139 |
Thanks to all for their views. I confirm the fairly low 50gb cap vs 100mbps circuit is due to an old tariff. It is no longer available but the ISP cannot remove it easily because of the regulator. All new users or changers automatically have unlimited but at a substantially higher monthly cost which I am trying to avoid because the cap has been adequate for years. I do not stream, peer or otherwise have need for substantial throughput. I have followed the advice for run time and have changed the preferences from the default 8 hrs to 22 hrs and will see whether that helps. I confirm that it is the three threadrippers which wireshark identified straight away as the hoggers - every other endpoint line was low mb's. Presumably if the 'bad' tasks work through or are withdrawn this will also help. jsm |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
So why would this way be any faster?Yes. But it's rather slow to happen.Shouldn't it have already done that when the 2nd genuine one was posted?There's a workaround. If you use the same way to mark it as a duplicate every time, the software will see it as multiple identical posts, and delete all but one of them.Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
Wow. In the UK there's no data cap whatsoever. I can download at 54Mbps 24/7.This is unsustainable and I will either have to shell out for an expensive unlimited contract (because I have an Ultima connection at over 100mbps) or cut back on Rosetta work.I'm guessing you don't have any real options when it comes to ISP? 50GB limit for a 100Mb connection is insane IMHO. Higher speed plans here come with high data caps. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
But it would be nice of the researchers would test their models a bit more before releasing them here. The odd error is OK, but when it's a case of the odd Task not being an error and all others erroring out it really is a bit silly.There's Ralph@home for that. Not sure why they hardly ever use it. I think I get tasks from there once every 3 months. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
You said "Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering." which is a false assumption. It could take a while. My computers for example haven't tried to get more Rosetta since it ran out, since they got stuff from other projects.That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.It really is a shame you don't read all of what's posted before you feel the need to comment. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
I assume he meant which scientist....Who is the guilty party submitting tasks that all "Error while computing"? I have 70 tasks on April 4th that have errored with no credit.That would be you. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,299,502 RAC: 2,235 |
Bigot! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1631 Credit: 16,704,579 RAC: 9,753 |
And i'll repeat it again- It really is a shame you don't read all of what's posted before you feel the need to comment.You said "Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering." which is a false assumption. It could take a while. My computers for example haven't tried to get more Rosetta since it ran out, since they got stuff from other projects.That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.It really is a shame you don't read all of what's posted before you feel the need to comment. I addressed the point you made in the post that i quoted when i made that statement. Regardless of caches & resource share settings & people's micro-management of their projects- if you compare the graph of the current recovery with past recoveries after a lack of work over the same recovery time frame, the loss of almost 1/3 of the processing resources is quite obvious. The mis-configured Work Units are making it impossible for a large number of host to do any work (and that batch of Tasks that produced nothing but errors in a matter of seconds didn't help things along either). Current recovery (or lack of) after almost 3 days. Previous recovery (after a much longer outage) after 3 days. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,789,853 RAC: 1,860 |
not error at all here, no computer with the 6GB requirement of Ram ... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1229 Credit: 14,172,067 RAC: 1,295 |
So why would this way be any faster?Yes. But it's rather slow to happen.Shouldn't it have already done that when the 2nd genuine one was posted?There's a workaround. If you use the same way to mark it as a duplicate every time, the software will see it as multiple identical posts, and delete all but one of them.Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? I didn't say it would be faster. However, it will give users less to waste their time reading. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2073 Credit: 40,601,230 RAC: 5,272 |
A few days in and the impact of the mis-configured Work Units is becoming clearer. Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering.In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… Returning to my anecdote about a remote PC I have being unable to download any Rosetta tasks, so running its backup project, WCG, 24/7, my local laptop is also doing weird things. It refuses to run a particular Rosetta task, so it's running those it has room for - a combination of WCG and later Rosetta tasks, but only 3 on 4 cores. Now I know it's definitely happening, I've set NNT and suspended all running tasks except for the one problem Rosetta task. It still refuses to run, even as the only task. No tasks are running in my experiment! So, maintaining NNT, I've found some combination of WCG and Rosetta tasks that'll run together on all 4 cores. I'll work my way through my small cache until all are completed bar the problem task and see if it runs then. If not, I'll finally abort it and just grab fresh tasks. Bit of a weird one. Even attempting to micromanage tasks doesn't entirely work. No wonder that graph is running so much lower than it was, if I'm any example |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2073 Credit: 40,601,230 RAC: 5,272 |
Has there been a significant project change which could be the cause of this increased usage or am I looking for another problem?Hard to say. I haven't mentioned this because I'm doing some experiments with overclocking and I thought the errors were being caused by me. So it was everyone? Interesting to know. In the last day or so, these computation errors appear to have stopped. Can others confirm that too? Hopefully that stops all the re-downloading issues and bandwidth penalties. Is it stopping the excessive memory & disk space demands too? And now I notice queued jobs have plummeted to barely more than 100k. Hmm... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2073 Credit: 40,601,230 RAC: 5,272 |
Thanks to all for their views. I confirm the fairly low 50gb cap vs 100mbps circuit is due to an old tariff. It is no longer available but the ISP cannot remove it easily because of the regulator. All new users or changers automatically have unlimited but at a substantially higher monthly cost which I am trying to avoid because the cap has been adequate for years. I do not stream, peer or otherwise have need for substantial throughput. Good idea to increase the runtime, but be aware that the tasks you already hold in your cache will run for much longer than Boinc realises, so it's entirely possible/probable you won't meet deadline on the later ones. If my memory serves me, the unstarted tasks will still show they're 8hrs long, but will actually run for your new preference of 22hrs. This runtime figure for unstarted tasks doesn't update so it will be a permanent problem. The way around this is to reduce your cache size by around two-thirds, so even though it continues to show Boinc the wrong expected runtime, you won't exceed deadlines in practice. You may've already noticed this on your threadrippers. Crazy as it seems, the solution I've described ought to prevent the problems I've pointed to. It's a feature rather than a bug... <cough> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1229 Credit: 14,172,067 RAC: 1,295 |
[snip] In the last day or so, these computation errors appear to have stopped. Can others confirm that too? The computation errors due to problems with 6mers have stopped. I didn't see those other errors, so I can't tell if they have stopped. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org