Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 311 · Next
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something. However, it helps the crunchers to have a Plan B in mind. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something. No shortage of tasks throughout, just awarding credit But all solved now and no tasks awaiting validation - all caught up, thanks |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Is your download server having problems? My computer has been trying to download a rather small input file for many hours, and fails every time. 10v1nmgb_c724_10mer_gb_000434.zip It looks like it won't download any more tasks until after it gets this input file. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
2/7/2020 12:22:52 PM | | Project communication failed: attempting access to reference site 2/7/2020 12:22:52 PM | Rosetta@home | Temporarily failed download of 10v1nmgb_c724_10mer_gb_000434.zip: transient HTTP error 2/7/2020 12:22:52 PM | Rosetta@home | Backing off 03:13:23 on download of 10v1nmgb_c724_10mer_gb_000434.zip 2/7/2020 12:22:54 PM | | Internet access OK - project servers may be temporarily down. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
It looks like it won't download any more tasks until after it gets this input file. If it is holding up your machine, I think I would let the current tasks finish, detach, and try again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
It looks like it won't download any more tasks until after it gets this input file. How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start? It's doing more for all the other BOINC projects I have selected that offer CPU tasks but no GPU tasks, though. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start? You detach and end its misery. Sometimes a reboot works though. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start? A restart followed by telling BOINC to retry the download finally helped. The file downloaded, and the task is now ready to start. Previously, telling BOINC to retry the download without the Windows restart didn't help. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I had the same problem with a stuck download, and a reboot fixed it for me too. But that practically never happens. So the fact that it is happening more often now indicates to me that their servers are overloaded. I will take a machine off. And if they want to tell us otherwise, I will listen. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,559,984 RAC: 14,720 |
If downloading retry does not help - aborting file transfer will usually work. Corresponding task will fail, but BOINC is smart enough to abort such tasks without trying to run it. So no any computation is wasted. P.S. I also have few stuck files in last few days (previous such case was about a year ago). I think one of the files was exactly the same file. And BOINC also stop getting new work from R@H until i have noticed it today and aborted stuck file transfer. One of tasks with "stuck" downloads: https://boinc.bakerlab.org/rosetta/result.php?resultid=1121514493 |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just had to abort one on my best machine, a Ryzen 3700x. A reboot did not fix it. Rosetta is beginning to lose some of its attraction for me. It was always a set-and-forget project. The errors were minor, and did not hang anything up. And explanation would be useful, as unlikely as that it. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
I'm getting lots of downloads (always a 3kB zip file) that stick (on all 4 computers). A temporary workaround seems to be to abort the task (not the download), then update the project so the project acknowledges you don't want that task that you can't get. It will then get others instead. But it's happening quite a lot. Unless I'm on holiday, I have a permanent monitor beside me showing what all my computers are doing on Boinc (using Boinctasks), but I'm sure many people won't check their machines that often. And if that download failed for me, will it fail for the next person it gives it to, and so on? Also I seem to have quite a high percentage of "error while computing" on all 4 machines (about a third of them). Is this normal or should I be trying to tweak something? I know with LHC@home an update to virtual machine fixed it. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
And if that download failed for me, will it fail for the next person it gives it to, and so on? I am wondering whether it is related to the high memory requirements of some of the files recently. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510 Probably they are two different things, but I will monitor the amount of available memory the next time I see one stuck. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
Failed Downloads. I, too have seen many ~3kb or so file size downloads just hang or 'stall' at somewhere around 80-90% completion. Then they just sit and seem to rob my limited bandwidth impeding other traffic up and downloads. I delete the stalled download, then refresh and it gets replaced by new. Then I watch to make sure it d/l's successful. Sometimes a stop and start of 'network access or activity' will let it resume but usually it stalls out again. I've been noticing this for the last couple of weeks I think. Various file names but they are always small files ~3kb or so in size. When you have 20 boxes sharing a 7 Mbs DSL line, bandwidth can be sketchy under the best conditions. 8^( /Mike |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
Yes, same here, stalled downloads can only be fixed by manual intervention (abort or abort) and therefore a big pain to keep crunching the project. They require continuous attention, which is not sustainable. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
Yes, same here, stalled downloads can only be fixed by manual intervention (abort or abort) and therefore a big pain to keep crunching the project. They require continuous attention, which is not sustainable. Just had one I can't fix. Usually aborting the download, then aborting the task, then reporting it, allows me to continue. But now Boinc is still saying: Rosetta@home 16/02/2020 11:00:16 AM Not requesting tasks: some download is stalled I'll try a fresh post on this here, and ask in the main Boinc forum why Boinc thinks something is still stalled which isn't. P.S. For some reason I'm not getting emailed when someone posts in this thread. Another problem! Works fine in forums of all other projects. Ah, a hidden preference defaulting to a daft way - why would I subscribe to a thread if I didn't want to be told? |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 404 Credit: 12,294,748 RAC: 2,551 |
When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks. Do you mean completely self corrected, or self corrected after you aborted the task? If I don't abort the task, I've seen it still stuck after about 18 hours. It just keeps on retrying and failing to download about every 3 hours. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 404 Credit: 12,294,748 RAC: 2,551 |
When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks. I abort the transfer (not the task) and normally that is enough to allow downloads to restart when I do an update project. On the odd occasion, however, it has given the message you reported after the update. In that case I leave it an hour and redo the update, on all occasions so far the update has succeeded in bringing down new WUs. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks. Ok thanks, in the future I'll just abort then leave it alone. Although the next time it happens I'm going to try to gather technical info on the problem - see this thread over at Boinc: https://boinc.berkeley.edu/dev/forum_thread.php?id=13435 I've been requested to: "1) if you see it happening, set <http_debug> in Event Log options, and retry the transfer - find out what's happening behind that 'transient HTTP error'. 2) make a careful and exact note of the file name in question. Cancel the download, and make sure it disappears from the transfers tab. Restart the client, and if the 'stalled download' message reappears, have a very careful 'read only' (no edits) peek inside client_state.xml - same folder. Find the reference (if any) to the file you cancelled, and post the whole of the <file> ... </file> section it's enclosed in." |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org