Stalled downloads

Message boards : Number crunching : Stalled downloads

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91709 - Posted: 16 Feb 2020, 11:07:52 UTC

I keep getting downloads of 3kB files getting stuck. Aborting the download, then aborting the task, then updating the project usually works. But sometimes I still can't get new work until I actually reboot the computer! Boinc thinks the download is still stalled:

Rosetta@home 16/02/2020 11:00:16 AM Not requesting tasks: some download is stalled
ID: 91709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91711 - Posted: 16 Feb 2020, 13:39:39 UTC - in response to Message 91709.  

Managed to find the log from before I had to reboot:

16-Feb-2020 07:49:29 [Rosetta@home] Started download of 9v1nm_gb_c815_9mer_gb_001245.zip
16-Feb-2020 07:54:36 [Rosetta@home] Temporarily failed download of 9v1nm_gb_c815_9mer_gb_001245.zip: transient HTTP error
16-Feb-2020 07:54:36 [Rosetta@home] Backing off 03:44:45 on download of 9v1nm_gb_c815_9mer_gb_001245.zip
16-Feb-2020 07:54:37 [---] Project communication failed: attempting access to reference site
16-Feb-2020 07:54:38 [---] Internet access OK - project servers may be temporarily down.
16-Feb-2020 09:29:54 [Rosetta@home] Computation for task rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0 finished
16-Feb-2020 09:29:59 [Rosetta@home] Starting task 7ub7ru9a_3h_design1_893125_1_0
16-Feb-2020 09:30:00 [Rosetta@home] Started upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0
16-Feb-2020 09:30:04 [Rosetta@home] Finished upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0
16-Feb-2020 10:30:15 [Rosetta@home] Sending scheduler request: To report completed tasks.
16-Feb-2020 10:30:15 [Rosetta@home] Reporting 1 completed tasks
16-Feb-2020 10:30:15 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:30:17 [Rosetta@home] Scheduler request completed
16-Feb-2020 10:59:01 [Rosetta@home] task 9v1nm_gb_c815_9mer_gb_001245_SAVE_ALL_OUT_892880_29_0 aborted by user
16-Feb-2020 10:59:06 [Rosetta@home] update requested by user
16-Feb-2020 10:59:07 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 10:59:07 [Rosetta@home] Reporting 1 completed tasks
16-Feb-2020 10:59:07 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:59:08 [Rosetta@home] Scheduler request completed
16-Feb-2020 10:59:25 [Rosetta@home] update requested by user
16-Feb-2020 10:59:28 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 10:59:28 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:59:30 [Rosetta@home] Scheduler request completed
16-Feb-2020 11:00:14 [Rosetta@home] update requested by user
16-Feb-2020 11:00:16 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 11:00:16 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 11:00:17 [Rosetta@home] Scheduler request completed
16-Feb-2020 11:10:02 [Rosetta@home] update requested by user
16-Feb-2020 11:10:07 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 11:10:07 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 11:10:09 [Rosetta@home] Scheduler request completed
ID: 91711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 182,747,380
RAC: 334,888
Message 91715 - Posted: 16 Feb 2020, 18:31:59 UTC

Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again.
ID: 91715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91716 - Posted: 16 Feb 2020, 18:51:47 UTC - in response to Message 91715.  

Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again.


I haven't tried just Boinc, presumably that would be just as effective. But most of my machines are remote, so a system restart was easier than logging onto the machine and manually restarting Boinc. I don't think I can do that remotely through Boinctasks.
ID: 91716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91717 - Posted: 16 Feb 2020, 18:53:21 UTC - in response to Message 91715.  

Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again.


Richard Haselgrove over at Boinc is looking into it, but needs some logs from someone with a stuck WU. See https://boinc.berkeley.edu/dev/forum_thread.php?id=13435
ID: 91717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 197
Credit: 17,553,102
RAC: 10,754
Message 91743 - Posted: 19 Feb 2020, 11:46:43 UTC
Last modified: 19 Feb 2020, 12:23:43 UTC

Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days.
There were 4 or 5 times from beginning of February.
ID: 91743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91753 - Posted: 19 Feb 2020, 19:55:26 UTC - in response to Message 91743.  
Last modified: 19 Feb 2020, 19:57:56 UTC

Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days.
There were 4 or 5 times from beginning of February.


Seems to have been fine here for a few days (on 4 computers), I should have seen more problems by now. Mind you I'm not getting any of the type of tasks that get stuck - "multistate" - are those the ones you get stuck with? Maybe they've paused those while they fix something?
ID: 91753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91754 - Posted: 19 Feb 2020, 20:56:06 UTC - in response to Message 91753.  

Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days.
There were 4 or 5 times from beginning of February.


Seems to have been fine here for a few days (on 4 computers), I should have seen more problems by now. Mind you I'm not getting any of the type of tasks that get stuck - "multistate" - are those the ones you get stuck with? Maybe they've paused those while they fix something?


Correction, just got a multistate, and it downloaded fine.
ID: 91754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 216
Credit: 7,584,609
RAC: 14,802
Message 91755 - Posted: 19 Feb 2020, 22:26:10 UTC - in response to Message 91753.  

Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days.
There were 4 or 5 times from beginning of February.


Seems to have been fine here for a few days (on 4 computers), I should have seen more problems by now. Mind you I'm not getting any of the type of tasks that get stuck - "multistate" - are those the ones you get stuck with? Maybe they've paused those while they fix something?


Mine tended to be rb_02 and it was only a small portion of those.
ID: 91755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91756 - Posted: 19 Feb 2020, 23:43:40 UTC - in response to Message 91755.  

Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days.
There were 4 or 5 times from beginning of February.


Seems to have been fine here for a few days (on 4 computers), I should have seen more problems by now. Mind you I'm not getting any of the type of tasks that get stuck - "multistate" - are those the ones you get stuck with? Maybe they've paused those while they fix something?


Mine tended to be rb_02 and it was only a small portion of those.


In that case I guess it was a random fault with a Rosetta server. But once they were stuck, a retry didn't help. Corrupt disk somewhere in Rosetta?

Oh well, every time it happens I can always remove it and get it going again. While I'm not looking, it can always fall back on another project.

But ever since someone offered to help look at the problem, I've not had it to give them any logs!
ID: 91756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Trotador

Send message
Joined: 30 May 09
Posts: 108
Credit: 182,747,380
RAC: 334,888
Message 91798 - Posted: 28 Feb 2020, 17:27:11 UTC

This issue continues occurring everyday but it is being specially annoying today. All hosts blocked to download new units and some of them ending idle.

Has it been looked at project side?
ID: 91798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 197
Credit: 17,553,102
RAC: 10,754
Message 91812 - Posted: 29 Feb 2020, 22:26:11 UTC
Last modified: 29 Feb 2020, 22:27:16 UTC

Yep, I got a bunch of stuck downloads at 28 Feb too.

Latest 2 examples:

https://boinc.bakerlab.org/rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip

https://boinc.bakerlab.org/rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip

From BOINC it looks like this (with http_debug):
01/03/2020 00:30:08 | Rosetta@home | Started download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip
01/03/2020 00:35:15 | Rosetta@home | Temporarily failed download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip: transient HTTP error
01/03/2020 00:35:15 | Rosetta@home | Backing off 05:44:16 on download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip
--------i have noticed stalled download (it was stuck for about 15-20 hours already )  turned http_debug on and press "retry"  -----------
01/03/2020 00:42:31 |  | Re-reading cc_config.xml
01/03/2020 00:42:31 |  | log flags: file_xfer, sched_ops, task, http_debug, work_fetch_debug
01/03/2020 00:42:31 | Rosetta@home | Found app_config.xml
01/03/2020 00:42:31 | Rosetta@home | [work_fetch] REC 4936.494 prio -0.068 can't request work: some download is stalled
01/03/2020 00:42:31 | Rosetta@home | [work_fetch] share 0.000
01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::init_get(): https://boinc.bakerlab.org/rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip
01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle 'D:Boincca-bundle.crt'
01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
01/03/2020 00:42:59 | Rosetta@home | Started download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip
01/03/2020 00:42:59 | Rosetta@home | [http] [ID#10522] Info:  Connection 3013 seems to be dead!
01/03/2020 00:42:59 | Rosetta@home | [http] [ID#10522] Info:  Closing connection 3013
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Info:    Trying 128.95.160.156...
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Info:  Connected to boinc.bakerlab.org (128.95.160.156) port 80 (#3014)
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: GET /rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip HTTP/1.1
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Host: boinc.bakerlab.org
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.2)
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept: */*
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept-Encoding: deflate, gzip
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Content-Type: application/x-www-form-urlencoded
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept-Language: en_GB
01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server:
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: HTTP/1.1 200 OK
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Date: Sat, 29 Feb 2020 21:42:58 GMT
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Server: Apache/2.4.18
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Last-Modified: Sat, 22 Feb 2020 18:36:23 GMT
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: ETag: "a8a-59f2e6a4792b8"
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Accept-Ranges: bytes
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Content-Length: 2698
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Content-Type: application/zip
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server:
01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: PK
01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info:  Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info:  Closing connection 3014
01/03/2020 00:48:06 | Rosetta@home | [http] HTTP error: Timeout was reached
01/03/2020 00:48:06 | Rosetta@home | Temporarily failed download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip: transient HTTP error
01/03/2020 00:48:06 | Rosetta@home | Backing off 03:56:16 on download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip 



From a browser or other programs it looks the same: R@H server is responding, downloading of file begins but at some point completely stops until timeout is triggered. Retries does not help - it just repeat loop.
ID: 91812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91813 - Posted: 29 Feb 2020, 22:43:01 UTC - in response to Message 91812.  

I tried both those links in my browser, the first worked fine, but the second stopped at 2610 of 2698 bytes. Seems rather random.
ID: 91813 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 197
Credit: 17,553,102
RAC: 10,754
Message 91816 - Posted: 1 Mar 2020, 4:23:51 UTC

Yes, first link is now working for me too. But it did not work at time when i was writing my previous post (29 Feb 2020 ~ 22:20 UTC ).
ID: 91816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 619
Credit: 44,828,254
RAC: 31,926
Message 91820 - Posted: 1 Mar 2020, 18:09:08 UTC

I have a few more stalled ones today. The main problem of course is that it prevents others from downloading, so you have to babysit it.

It is fun for a while, but it is getting to be like LHC. If they can't get their servers to work, there is not much I can do for them.
ID: 91820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91821 - Posted: 1 Mar 2020, 20:23:48 UTC - in response to Message 91820.  

I have a few more stalled ones today. The main problem of course is that it prevents others from downloading, so you have to babysit it.

It is fun for a while, but it is getting to be like LHC. If they can't get their servers to work, there is not much I can do for them.


LHC is irritating me too a little bit, but it's only CMS that screws up. You can turn CMS off completely in the website settings, or like me, just leave it running. They usually fail very quickly and don't waste much time, and I'm assuming that the failed tasks are helping them to fix the problem in some way.

Also, if you're NOT running Linux, then switch off "run native tasks" in the LHC website settings. I had that enabled (I use Windows 10), thinking it would give more options of tasks to run. But it ended up stopping me getting any Theory or Atlas tasks.
ID: 91821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 619
Credit: 44,828,254
RAC: 31,926
Message 91822 - Posted: 1 Mar 2020, 21:47:21 UTC - in response to Message 91821.  

LHC is irritating me too a little bit, but it's only CMS that screws up.

Thanks. I am running native ATLAS now. If they ever get CMS up again, I will give it a try. I think they are working on it.

I just hope the Rosetta glitch is a minor server issue that does not fall in the long-term problem category that LHC does.
ID: 91822 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91823 - Posted: 1 Mar 2020, 22:06:46 UTC - in response to Message 91822.  

LHC is irritating me too a little bit, but it's only CMS that screws up.

Thanks. I am running native ATLAS now. If they ever get CMS up again, I will give it a try. I think they are working on it.

I just hope the Rosetta glitch is a minor server issue that does not fall in the long-term problem category that LHC does.


For me, CMS only occupies a small amount of my computer's time. Failed tasks fail very early. I continue to allow them to help them figure out the problem.

And for me, Rosetta is working perfectly now, not sure why. As I said earlier, I tried some links and failed to get a download in my browser of somebody's failed task, but no tasks my computers (4 of them) have been given are going wrong any more. Whatever was wrong isn't as bad as it used to be. I used to have to manually intervene with every computer about once a day. None have failed in the last week. It was mentioned somewhere that's it's just overloaded servers at their end. Maybe they upgraded something, or maybe there's less load as people go off and do other projects.
ID: 91823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 730
Credit: 5,404,916
RAC: 25
Message 91824 - Posted: 1 Mar 2020, 23:03:51 UTC - in response to Message 91822.  

LHC is irritating me too a little bit, but it's only CMS that screws up.

Thanks. I am running native ATLAS now. If they ever get CMS up again, I will give it a try. I think they are working on it.

I just hope the Rosetta glitch is a minor server issue that does not fall in the long-term problem category that LHC does.


Oh my god, where did you get all those Ryzens from? That's pure pornography!
ID: 91824 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 619
Credit: 44,828,254
RAC: 31,926
Message 91825 - Posted: 1 Mar 2020, 23:24:32 UTC - in response to Message 91824.  
Last modified: 1 Mar 2020, 23:29:26 UTC

Oh my god, where did you get all those Ryzens from? That's pure pornography!

I just happened to spend the winter expanding my fleet. They came online just in time for the coronavirus.

I also do Folding on each one too - which just recently announced a project for it.
https://foldingathome.org/2020/02/27/foldinghome-takes-up-the-fight-against-covid-19-2019-ncov/
I just reserve a core in BOINC to support each GPU (everything from a GTX 750 Ti up to an RTX 2060).

But they are really to heat my basement. You might as well have fun at the same time.
ID: 91825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Stalled downloads



©2021 University of Washington
https://www.bakerlab.org