Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 89 · 90 · 91 · 92 · 93 · 94 · 95 . . . 311 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
So why would this way be any faster?Yes. But it's rather slow to happen.Shouldn't it have already done that when the 2nd genuine one was posted?There's a workaround. If you use the same way to mark it as a duplicate every time, the software will see it as multiple identical posts, and delete all but one of them.Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Wow. In the UK there's no data cap whatsoever. I can download at 54Mbps 24/7.This is unsustainable and I will either have to shell out for an expensive unlimited contract (because I have an Ultima connection at over 100mbps) or cut back on Rosetta work.I'm guessing you don't have any real options when it comes to ISP? 50GB limit for a 100Mb connection is insane IMHO. Higher speed plans here come with high data caps. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
But it would be nice of the researchers would test their models a bit more before releasing them here. The odd error is OK, but when it's a case of the odd Task not being an error and all others erroring out it really is a bit silly.There's Ralph@home for that. Not sure why they hardly ever use it. I think I get tasks from there once every 3 months. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
You said "Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering." which is a false assumption. It could take a while. My computers for example haven't tried to get more Rosetta since it ran out, since they got stuff from other projects.That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.It really is a shame you don't read all of what's posted before you feel the need to comment. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
I assume he meant which scientist....Who is the guilty party submitting tasks that all "Error while computing"? I have 70 tasks on April 4th that have errored with no credit.That would be you. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Bigot! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
And i'll repeat it again- It really is a shame you don't read all of what's posted before you feel the need to comment.You said "Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering." which is a false assumption. It could take a while. My computers for example haven't tried to get more Rosetta since it ran out, since they got stuff from other projects.That means nothing. For example I might (manually or Boinc did it) download a load of work from another project when this one runs out. Now that has to be completed before it will get work from Rosetta again.It really is a shame you don't read all of what's posted before you feel the need to comment. I addressed the point you made in the post that i quoted when i made that statement. Regardless of caches & resource share settings & people's micro-management of their projects- if you compare the graph of the current recovery with past recoveries after a lack of work over the same recovery time frame, the loss of almost 1/3 of the processing resources is quite obvious. The mis-configured Work Units are making it impossible for a large number of host to do any work (and that batch of Tasks that produced nothing but errors in a matter of seconds didn't help things along either). Current recovery (or lack of) after almost 3 days. Previous recovery (after a much longer outage) after 3 days. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
not error at all here, no computer with the 6GB requirement of Ram ... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
So why would this way be any faster?Yes. But it's rather slow to happen.Shouldn't it have already done that when the 2nd genuine one was posted?There's a workaround. If you use the same way to mark it as a duplicate every time, the software will see it as multiple identical posts, and delete all but one of them.Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? I didn't say it would be faster. However, it will give users less to waste their time reading. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
A few days in and the impact of the mis-configured Work Units is becoming clearer. Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering.In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… Returning to my anecdote about a remote PC I have being unable to download any Rosetta tasks, so running its backup project, WCG, 24/7, my local laptop is also doing weird things. It refuses to run a particular Rosetta task, so it's running those it has room for - a combination of WCG and later Rosetta tasks, but only 3 on 4 cores. Now I know it's definitely happening, I've set NNT and suspended all running tasks except for the one problem Rosetta task. It still refuses to run, even as the only task. No tasks are running in my experiment! So, maintaining NNT, I've found some combination of WCG and Rosetta tasks that'll run together on all 4 cores. I'll work my way through my small cache until all are completed bar the problem task and see if it runs then. If not, I'll finally abort it and just grab fresh tasks. Bit of a weird one. Even attempting to micromanage tasks doesn't entirely work. No wonder that graph is running so much lower than it was, if I'm any example |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Has there been a significant project change which could be the cause of this increased usage or am I looking for another problem?Hard to say. I haven't mentioned this because I'm doing some experiments with overclocking and I thought the errors were being caused by me. So it was everyone? Interesting to know. In the last day or so, these computation errors appear to have stopped. Can others confirm that too? Hopefully that stops all the re-downloading issues and bandwidth penalties. Is it stopping the excessive memory & disk space demands too? And now I notice queued jobs have plummeted to barely more than 100k. Hmm... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Thanks to all for their views. I confirm the fairly low 50gb cap vs 100mbps circuit is due to an old tariff. It is no longer available but the ISP cannot remove it easily because of the regulator. All new users or changers automatically have unlimited but at a substantially higher monthly cost which I am trying to avoid because the cap has been adequate for years. I do not stream, peer or otherwise have need for substantial throughput. Good idea to increase the runtime, but be aware that the tasks you already hold in your cache will run for much longer than Boinc realises, so it's entirely possible/probable you won't meet deadline on the later ones. If my memory serves me, the unstarted tasks will still show they're 8hrs long, but will actually run for your new preference of 22hrs. This runtime figure for unstarted tasks doesn't update so it will be a permanent problem. The way around this is to reduce your cache size by around two-thirds, so even though it continues to show Boinc the wrong expected runtime, you won't exceed deadlines in practice. You may've already noticed this on your threadrippers. Crazy as it seems, the solution I've described ought to prevent the problems I've pointed to. It's a feature rather than a bug... <cough> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
[snip] In the last day or so, these computation errors appear to have stopped. Can others confirm that too? The computation errors due to problems with 6mers have stopped. I didn't see those other errors, so I can't tell if they have stopped. |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
weird thing, just got a resend but not sure to finish it ! https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1217325166 |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
This issue was discussed recently in another thread. The work unit got resent because the first machine hadn’t completed it by its deadline. But 10 minutes later – after you’d started the resend but before you’d finished it – the other host submitted its results. I think you’ll still get credit if you complete it before the deadline, but from the science perspective there’s no point because the results are already in. |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
with a 10GB boinc disk space setting, Boinc still send weird messages like this one
and nothing was downloaded ... i have to reset the project and this message is gone away, and now Boinc is downloading tasks again . . . even the first message of lack of memory gone too for this moment .. |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
Haha , they're back .
|
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
This issue was discussed recently in another thread. As well as claiming they needed 6.6 GB of RAM, the recent work units were configured to require 8.5 GB of disk space. With a preference setting maximum disk usage to 10 GB, and more than 1.5 GB already in use (around 2 GB is normal for R@h), the server was unable to send those tasks and so issued that warning. Resetting the project didn’t make any difference because the disk space that freed up was immediately consumed again by the smaller tasks you were able to download. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Resetting the project didn’t make any difference because the disk space that freed up was immediately consumed again by the smaller tasks you were able to download.Along with the executables & support data files. Over time as different Tasks are downloaded, those support data files will be re-downloaded & the lack of disk space issue will re-occur (if the configuration issue for certain Work Units hasn't been fixed by then), as you soon found out. Since the project is out of work again, other than the odd resend, now no one will be able to get any new work. Hopefully the next batches of work system requirements will be configured more appropriately. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
A new batch of work has been loaded up- hopefully these have their requirements set properly, and they won't error out in a matter of seconds either. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org