Message boards : Number crunching : Stalled downloads
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
Oh my god, where did you get all those Ryzens from? That's pure pornography! I think you're considerably richer than me. I'm using 2nd hand GPUs and old computers I've been given. And wondering how I'm going to pay the electric bill. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I think you're considerably richer than me. You need to check out a fellow named Doneske on WCG. He has a real layout. Though unfortunately you can not see it directly. But he mentions it once in a while, and you can infer it. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
This is interesting. After being stuck for 12 hours, BOINC is willing to download new Rosetta work units. It is perhaps not coincidental that I have it set to run the 12-hour work units. Apparently all the old ones have to finish. The ones that are stuck still remain stuck (four of the 4.08 Rosetta rb_03_01_17261_17076_ab_t000__robetta_cstwt_5.0_IGNORE_THE_REST), but at least you get new ones. So you could set a backup project (zero resource share) to fill in the cores as they go empty, and get back to Rosetta automatically. You then only have to delete the stuck ones at your convenience, but they do not tie up the entire machine. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
This is interesting. After being stuck for 12 hours, BOINC is willing to download new Rosetta work units. I always have more than one project for a processor, as I hate things sitting idle. Either I give them weights to relate to how important I think they are, or I set one to priority 0 so it only runs when the other isn't working. What's annoying me just now is I have most of my processors assisting GPUs for Milkyway and Einstein, so Rosetta doesn't have many spare cores to work with. I want one of those 64 core Ryzen threadrippers. It's a pity Rosetta won't run on GPUs, but I guess it depends on the calculations involved. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I want one of those 64 core Ryzen threadrippers. Save some money and still get a big boost. https://www.amazon.com/AMD-Ryzen-3900X-24-Thread-Processor/dp/B07SXMZLP9/ref=sr_1_3 I expect that sales are off due to the virus, so prices are coming down a bit. We can use the virus to fight the virus. By the way, I would rather have two of the smaller chips than a Threadripper. You have to feed it with memory, cool it, and buy expensive motherboards. Besides, a lot of projects can't use all those cores. You just buy into trouble. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
I want one of those 64 core Ryzen threadrippers. £400 or £3700, ok I didn't realise they cost THAT much! And surely any project can use all those cores, you just run many WUs at once? Rosetta for example is single core, yet it runs on multi core chips just fine, just several at once. The good thing about absurd chips like the threadripper is it makes the others cheaper. Same with cars, the posh folk buy new cars then I get their old ones cheaper. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
And surely any project can use all those cores, you just run many WUs at once? There are limits in BOINC (somewhere), there are limits in VirtualBox, there are limits all over the place. You will find them when you get to them. Threadrippers are really for servers, where space is at a premium. Also, they have cache limitations that you avoid with the smaller chips. I think the latter are more for our use. I even use the micro-ATX motherboards, since I have dedicated machines and don't need the other stuff. They are cheap and stable, since I never overclock (Just don't get the cheapest. They leave out the heatsinks on the voltage regulators.) |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
There are limits in BOINC (somewhere), there are limits in VirtualBox, there are limits all over the place. Are you saying you can't get Boinc to run 64 cores? I'm sure I've seen somebody doing so successfully. Threadrippers are really for servers, where space is at a premium. Also, they have cache limitations that you avoid with the smaller chips. I think the latter are more for our use. At the current prices, definitely. I even use the micro-ATX motherboards, since I have dedicated machines and don't need the other stuff. They are cheap and stable, since I never overclock (Just don't get the cheapest. They leave out the heatsinks on the voltage regulators.) I seldom buy boards, I tend to collect donated stuff and have fun cobbling it together. Only thing I've bought is GPUs, and those were second hand or broken. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Are you saying you can't get Boinc to run 64 cores? I'm sure I've seen somebody doing so successfully. I don't know where the limits are, but I have seen reports of problems in various places. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
Are you saying you can't get Boinc to run 64 cores? I'm sure I've seen somebody doing so successfully. Sounds like a CPU (memory access) problem rather than a Boinc problem: https://setiathome.berkeley.edu/forum_thread.php?id=83824&sort_style=6&start=0 |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
That speed problem may have to do with cache access. As I recall, the Threadrippers access it differently than the Ryzens, since they are intended more for server use. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Here is another curiosity. I saw a download start and then hang up for two work units. So I immediately did a "retry all", and it worked. So there is something about the time delay that causes them to stay stuck? |
Dr Who Fan Send message Joined: 28 May 06 Posts: 70 Credit: 265,580 RAC: 368 |
The stalled downloads are back. File name rb_03_08_17831_17664_ab_t000__h002_robetta.zip Rosetta@home rb_03_08_17831_17664_ab_t000__h002_robetta.zip 62.399 1.93 K 00:31:10 - 12:23:13 0.00 Kbps Download pending (Retry in: 02:47:54), retried: 10 10-Mar-2020 07:32:01 [Rosetta@home] [http] HTTP_OP::init_get(): https://boinc.bakerlab.org/rosetta/download/1cb/rb_03_08_17831_17664_ab_t000__h002_robetta.zip |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
The stalled downloads are back. Yes, I see a whole bunch of them too (on seven machines). They are mostly rb_03, but a few others too. You would think it would be easy for them to track them down. |
crashtech Send message Joined: 22 Jan 17 Posts: 2 Credit: 36,198,102 RAC: 19,234 |
I am getting small stuck downloads that stop the client from getting any work. Was hoping to see a solution here, but there doesn't seem to be one. Am discontinuing participation until this is fixed. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Am discontinuing participation until this is fixed. I have tried many times to get it to work, but I am being forced to take machines off of Rosetta as they get stuck. They will go back to where they were. You can't fight the coronavirus if they don't send you the work. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,783,744 RAC: 10,163 |
Am discontinuing participation until this is fixed. This is odd, as I've not had any stuck ones for over a fortnight on 4 different computers. There must be a reason they fixed it for me and not you :-) It just suddenly started working, before then every machine stuck once a day. I don't think I changed anything that would have fixed it. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,935,543 RAC: 12,792 |
Yep, got a bunch of stuck downloads on all of my computers today too. In addition to files from rb_03_08_ tasks already reported above there were few tasks like this: 10/03/2020 23:53:17 | Rosetta@home | Temporarily failed download of twc_method_msd_cpp_10v4nme2_1719_result_0965_msd.zip: transient HTTP error From task twc_method_msd_cpp_10v4nme2_1719_result_0965_msd_SAVE_ALL_OUT_901017_596_0 11/03/2020 00:31:08 | Rosetta@home | Started download of 11v1nmgb_c17732_11mer_gb_000552.zip from 11/03/2020 00:34:27 | Rosetta@home | task 11v1nmgb_c17732_11mer_gb_000552_SAVE_ALL_OUT_893258_131_0 aborted by user It getting really annoying, this bug repeats every few days for at least a month already. Next time i probable just switch my computers to WCG from R@H. P.S. Looks like aborting whole WUs with stalled downloads instead of aborting download itself (internet transfer) is a faster/easier way to clear such errors. BOINC resume work fetch almost immediately after WU abort while after aborting download it usually still refuses to get new work for few more hours (complaining about stalled downloads even after all of it already aborted) or until BOINC restart. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
This is odd, as I've not had any stuck ones for over a fortnight on 4 different computers. There must be a reason they fixed it for me and not you :-) It just suddenly started working, before then every machine stuck once a day. I don't think I changed anything that would have fixed it. I have seen that too. I wonder if it is determined by the number of machines (that is, cores) you have on Rosetta? As I go down, I will see. It could be that their server chokes up, but would be surprised if it starts working permanently. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Several stuck downloads recently. Here's one on on Ubuntu 19.10. The offending file is twc_method_msd_cpp_c3212_9mer_gb_000182_msd.zip (workunit 1014896837) After setting http_debug in the Event Log options and retrying the transfer the following appears in the Event Log: Tue 10 Mar 2020 02:49:18 PM PDT | | log flags: file_xfer, sched_ops, task, http_debug Tue 10 Mar 2020 03:04:06 PM PDT | Rosetta@home | [http] HTTP_OP::init_get(): https://boinc.bakerlab.org/rosetta/download/113/twc_method_msd_cpp_c3212_9mer_gb_000182_msd.zip Tue 10 Mar 2020 03:04:06 PM PDT | Rosetta@home | Started download of twc_method_msd_cpp_c3212_9mer_gb_000182_msd.zip Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Too old connection (2036 seconds), disconnect it Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Connection 354 seems to be dead! Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Closing connection 354 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Too old connection (2036 seconds), disconnect it Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Connection 355 seems to be dead! Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Closing connection 355 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Too old connection (2029 seconds), disconnect it Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Connection 356 seems to be dead! Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Closing connection 356 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Too old connection (2029 seconds), disconnect it Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Connection 357 seems to be dead! Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Closing connection 357 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Trying 128.95.160.157:80... Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: TCP_NODELAY set Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Connected to boinc.bakerlab.org (128.95.160.157) port 80 (#358) Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: GET /rosetta/download/113/twc_method_msd_cpp_c3212_9mer_gb_000182_msd.zip HTTP/1.1 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: Host: boinc.bakerlab.org Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: User-Agent: BOINC client (x86_64-pc-linux-gnu 7.16.3) Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: Accept: */* Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: Accept-Encoding: deflate, gzip Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: Accept-Language: en_CA Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Sent header to server: Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Info: Mark bundle as not supporting multiuse Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Received header from server: HTTP/1.1 200 OK Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Received header from server: Date: Tue, 10 Mar 2020 22:04:07 GMT Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Received header from server: Server: Apache/2.4.18 Tue 10 Mar 2020 03:04:07 PM PDT | Rosetta@home | [http] [ID#318] Received header from server: Last-Modified: Fri, 06 Mar 2020 21:55:13 GMT The download is still stuck at the same place (86.50%) |
Message boards :
Number crunching :
Stalled downloads
©2024 University of Washington
https://www.bakerlab.org