Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 311 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91635 - Posted: 31 Jan 2020, 15:18:57 UTC - in response to Message 91634.  

Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something.
However, it helps the crunchers to have a Plan B in mind.
ID: 91635 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 91639 - Posted: 31 Jan 2020, 23:05:20 UTC - in response to Message 91635.  

Thanks for reporting. I normally would not notice. I trust it is not a big deal, but maybe maintenance on a server or something.
However, it helps the crunchers to have a Plan B in mind.

No shortage of tasks throughout, just awarding credit

But all solved now and no tasks awaiting validation - all caught up, thanks
ID: 91639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 91659 - Posted: 7 Feb 2020, 19:32:46 UTC

Is your download server having problems?

My computer has been trying to download a rather small input file for many hours, and fails every time.

10v1nmgb_c724_10mer_gb_000434.zip

It looks like it won't download any more tasks until after it gets this input file.
ID: 91659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 91660 - Posted: 7 Feb 2020, 19:48:42 UTC

2/7/2020 12:22:52 PM | | Project communication failed: attempting access to reference site
2/7/2020 12:22:52 PM | Rosetta@home | Temporarily failed download of 10v1nmgb_c724_10mer_gb_000434.zip: transient HTTP error
2/7/2020 12:22:52 PM | Rosetta@home | Backing off 03:13:23 on download of 10v1nmgb_c724_10mer_gb_000434.zip
2/7/2020 12:22:54 PM | | Internet access OK - project servers may be temporarily down.
ID: 91660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91661 - Posted: 7 Feb 2020, 22:29:32 UTC - in response to Message 91659.  

It looks like it won't download any more tasks until after it gets this input file.

If it is holding up your machine, I think I would let the current tasks finish, detach, and try again.
ID: 91661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 91666 - Posted: 8 Feb 2020, 23:05:51 UTC - in response to Message 91661.  

It looks like it won't download any more tasks until after it gets this input file.

If it is holding up your machine, I think I would let the current tasks finish, detach, and try again.

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

It's doing more for all the other BOINC projects I have selected that offer CPU tasks but no GPU tasks, though.
ID: 91666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91667 - Posted: 9 Feb 2020, 1:16:23 UTC - in response to Message 91666.  
Last modified: 9 Feb 2020, 1:48:56 UTC

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

You detach and end its misery. Sometimes a reboot works though.
ID: 91667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 91668 - Posted: 9 Feb 2020, 4:18:28 UTC - in response to Message 91667.  

How am I supposed to do that if the only current Rosetta@Home task won't finish downloading so that it can start?

You detach and end its misery. Sometimes a reboot works though.

A restart followed by telling BOINC to retry the download finally helped. The file downloaded, and the task is now ready to start. Previously, telling BOINC to retry the download without the Windows restart didn't help.
ID: 91668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91673 - Posted: 10 Feb 2020, 17:14:49 UTC - in response to Message 91668.  
Last modified: 10 Feb 2020, 17:29:19 UTC

I had the same problem with a stuck download, and a reboot fixed it for me too. But that practically never happens. So the fact that it is happening more often now indicates to me that their servers are overloaded.
I will take a machine off.

And if they want to tell us otherwise, I will listen.
ID: 91673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,559,984
RAC: 14,720
Message 91676 - Posted: 12 Feb 2020, 7:46:37 UTC - in response to Message 91668.  
Last modified: 12 Feb 2020, 7:47:32 UTC

If downloading retry does not help - aborting file transfer will usually work.
Corresponding task will fail, but BOINC is smart enough to abort such tasks without trying to run it.
So no any computation is wasted.

P.S.
I also have few stuck files in last few days (previous such case was about a year ago).
I think one of the files was exactly the same file. And BOINC also stop getting new work from R@H until i have noticed it today and aborted stuck file transfer.

One of tasks with "stuck" downloads: https://boinc.bakerlab.org/rosetta/result.php?resultid=1121514493
ID: 91676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91684 - Posted: 12 Feb 2020, 22:53:42 UTC - in response to Message 91676.  

I just had to abort one on my best machine, a Ryzen 3700x. A reboot did not fix it.
Rosetta is beginning to lose some of its attraction for me. It was always a set-and-forget project. The errors were minor, and did not hang anything up.

And explanation would be useful, as unlikely as that it.
ID: 91684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 91694 - Posted: 13 Feb 2020, 22:29:02 UTC
Last modified: 13 Feb 2020, 22:35:59 UTC

I'm getting lots of downloads (always a 3kB zip file) that stick (on all 4 computers). A temporary workaround seems to be to abort the task (not the download), then update the project so the project acknowledges you don't want that task that you can't get. It will then get others instead. But it's happening quite a lot. Unless I'm on holiday, I have a permanent monitor beside me showing what all my computers are doing on Boinc (using Boinctasks), but I'm sure many people won't check their machines that often. And if that download failed for me, will it fail for the next person it gives it to, and so on?

Also I seem to have quite a high percentage of "error while computing" on all 4 machines (about a third of them). Is this normal or should I be trying to tweak something? I know with LHC@home an update to virtual machine fixed it.
ID: 91694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 91695 - Posted: 13 Feb 2020, 23:35:02 UTC - in response to Message 91694.  

And if that download failed for me, will it fail for the next person it gives it to, and so on?

I am wondering whether it is related to the high memory requirements of some of the files recently.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13510

Probably they are two different things, but I will monitor the amount of available memory the next time I see one stuck.
ID: 91695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 91700 - Posted: 14 Feb 2020, 16:28:25 UTC

Failed Downloads. I, too have seen many ~3kb or so file size downloads just hang or 'stall' at somewhere
around 80-90% completion. Then they just sit and seem to rob my limited bandwidth impeding other traffic up and
downloads. I delete the stalled download, then refresh and it gets replaced by new. Then I watch to make
sure it d/l's successful. Sometimes a stop and start of 'network access or activity' will let it resume but usually it
stalls out again. I've been noticing this for the last couple of weeks I think. Various file names but they
are always small files ~3kb or so in size.

When you have 20 boxes sharing a 7 Mbs DSL line, bandwidth can be sketchy under the best conditions. 8^(
/Mike
ID: 91700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 291,214,977
RAC: 0
Message 91701 - Posted: 14 Feb 2020, 17:58:16 UTC - in response to Message 91700.  

Yes, same here, stalled downloads can only be fixed by manual intervention (abort or abort) and therefore a big pain to keep crunching the project. They require continuous attention, which is not sustainable.
ID: 91701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 91708 - Posted: 16 Feb 2020, 11:03:28 UTC - in response to Message 91701.  
Last modified: 16 Feb 2020, 11:05:56 UTC

Yes, same here, stalled downloads can only be fixed by manual intervention (abort or abort) and therefore a big pain to keep crunching the project. They require continuous attention, which is not sustainable.


Just had one I can't fix. Usually aborting the download, then aborting the task, then reporting it, allows me to continue. But now Boinc is still saying:

Rosetta@home 16/02/2020 11:00:16 AM Not requesting tasks: some download is stalled

I'll try a fresh post on this here, and ask in the main Boinc forum why Boinc thinks something is still stalled which isn't.

P.S. For some reason I'm not getting emailed when someone posts in this thread. Another problem! Works fine in forums of all other projects. Ah, a hidden preference defaulting to a daft way - why would I subscribe to a thread if I didn't want to be told?
ID: 91708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 404
Credit: 12,294,748
RAC: 2,551
Message 91712 - Posted: 16 Feb 2020, 14:11:16 UTC - in response to Message 91708.  

When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks.
ID: 91712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 91713 - Posted: 16 Feb 2020, 15:12:06 UTC - in response to Message 91712.  

When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks.


Do you mean completely self corrected, or self corrected after you aborted the task? If I don't abort the task, I've seen it still stuck after about 18 hours. It just keeps on retrying and failing to download about every 3 hours.
ID: 91713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 404
Credit: 12,294,748
RAC: 2,551
Message 91718 - Posted: 16 Feb 2020, 19:32:03 UTC - in response to Message 91713.  

When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks.


Do you mean completely self corrected, or self corrected after you aborted the task? If I don't abort the task, I've seen it still stuck after about 18 hours. It just keeps on retrying and failing to download about every 3 hours.

I abort the transfer (not the task) and normally that is enough to allow downloads to restart when I do an update project.

On the odd occasion, however, it has given the message you reported after the update. In that case I leave it an hour and redo the update, on all occasions so far the update has succeeded in bringing down new WUs.
ID: 91718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 91719 - Posted: 16 Feb 2020, 20:00:03 UTC - in response to Message 91718.  
Last modified: 16 Feb 2020, 20:00:32 UTC

When this has happened to me it has self corrected after about an hour - give it time and then go for another update and you should get some new tasks.


Do you mean completely self corrected, or self corrected after you aborted the task? If I don't abort the task, I've seen it still stuck after about 18 hours. It just keeps on retrying and failing to download about every 3 hours.

I abort the transfer (not the task) and normally that is enough to allow downloads to restart when I do an update project.

On the odd occasion, however, it has given the message you reported after the update. In that case I leave it an hour and redo the update, on all occasions so far the update has succeeded in bringing down new WUs.


Ok thanks, in the future I'll just abort then leave it alone. Although the next time it happens I'm going to try to gather technical info on the problem - see this thread over at Boinc: https://boinc.berkeley.edu/dev/forum_thread.php?id=13435 I've been requested to:

"1) if you see it happening, set <http_debug> in Event Log options, and retry the transfer - find out what's happening behind that 'transient HTTP error'.
2) make a careful and exact note of the file name in question. Cancel the download, and make sure it disappears from the transfers tab. Restart the client, and if the 'stalled download' message reappears, have a very careful 'read only' (no edits) peek inside client_state.xml - same folder. Find the reference (if any) to the file you cancelled, and post the whole of the

<file>
...
</file>

section it's enclosed in."
ID: 91719 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 . . . 311 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org