Message boards : Number crunching : Stuck on Uploading
Author | Message |
---|---|
Bill Kozorra Send message Joined: 25 Jan 11 Posts: 5 Credit: 86,550,972 RAC: 0 |
All my computers, both Mac and Windows, have work units stuck on "Uploading". Is anyone else having this issue? |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Yes, I answered in your previous thread. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6849&nowrap=true#80231 |
XAVER Send message Joined: 11 Jun 16 Posts: 1 Credit: 685,308 RAC: 90 |
Yes, same with me since yesterday evening (CEST). Download also (successful only after several tries). |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Yes, same with me since yesterday evening (CEST). Download also (successful only after several tries). Same thing here. I was starting to get nervous that it was only me - started clearing DNS caches, even reset my modem and router, etc. One thing I'm noticing now is that so far, the Server Status page is not loading so I suspect it's something on the server side but we cannot even check from our end. Maybe the servers are finally buckling under the load. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
hjdghjdghjghjjggh Send message Joined: 10 May 16 Posts: 7 Credit: 9,749 RAC: 0 |
I can confirm as well. |
hjdghjdghjghjjggh Send message Joined: 10 May 16 Posts: 7 Credit: 9,749 RAC: 0 |
Well, they fixed the status page, and it seems like someone broke something... |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I'll add a Linux box to the list of failing OSes, but not actually absolutely certain right now... I can see a Mac and Windows 10 box doing it right now, but the closest Linux box doesn't feel like showing the BOINC manager right now... Good to see they noticed the server status needed fixing. Small step, but something. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
hjdghjdghjghjjggh Send message Joined: 10 May 16 Posts: 7 Credit: 9,749 RAC: 0 |
I should've probably said earlier, but that machine is linux being remote controlled locally by a mac. It can download and run tasks just fine, but any uploading is failing. |
Murodoch Send message Joined: 10 Apr 07 Posts: 6 Credit: 21,947,500 RAC: 1,448 |
Ran into the same problem since yesterday afternoon (time zone +8).Downloading is OK. |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
Some of my yesterday finished WUs are stuck in the upload queue. If i start the upload manually get i always the same message : 24.06.2016 10:45:49 | rosetta@home | Started upload of rb_06_22_66335_110608_ab_stage0_h002___robetta_IGNORE_THE_REST_12_19_383620_11_0_0 However, all recently finished WUs are uploaded on the spot. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
I enabled some of the advanced logging features of BOINC Manager, looks like the issue is with srv1.bakerlab.org. Currently, my personal DNS cache maps this server to 128.95.160.142 and indeed, BOINC attempts to connect to that IP:
I can also confirm that other machines on the internet resolve srv1.bakerlab.org to 128.95.160.142 - maybe this DNS entry is invalid on some major DNS servers... or something much simpler, that particular server is busy.. Will enable [http debug] logging on my other hosts to see if I can confirm which server(s) are accepting work, maybe there is more than one server and it depends on which one the task is trying to get sent back to.. all speculation at this point. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
My suspicions confirmed;
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Thanks Timo! I just tried it and it works for uploads. Here is the line to add to Windows hosts file to work around the server that is not responding. If you don't know what to do with this information, it would be best to just wait for the issue to be resolved on the server side. 128.95.160.145 srv1.bakerlab.org Rosetta Moderator: Mod.Sense |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
What about manually modifying client_state.xml? I'm wondering if it could be safe or have some effect something like... ***PLEASE DON'T DO THIS*** Replace <upload_url>http://srv1.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url> with <upload_url>http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler</upload_url> |
hjdghjdghjghjjggh Send message Joined: 10 May 16 Posts: 7 Credit: 9,749 RAC: 0 |
Well, I see the status page says all the servers are online, the problem is I have the same tasks failing to upload. And this appears to be why: |
Luigi R. Send message Joined: 7 Feb 14 Posts: 39 Credit: 2,045,527 RAC: 0 |
Thanks Timo! I just tried it and it works for uploads. Thanks, it works on linux *buntu too. Put that line in /etc/hosts. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Thanks Timo! I just tried it and it works for uploads. Yep, this is a solution.. I just hope it doesn't mess anything up on the server end if a result is landing on a different server than the server cluster was expecting to receive it at?? **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
Change of hosts.txt is a stupid idea! There should be another way. Replace of "srv1.bakerlab.org" in client_state.xml is useless. Whenever i restart the boinc manager is it again there. I can't find from where boinc restore this. i have it also replace in client_state_prev.xml. Thanks your lazy guys have i now two WU's lost! 24 hours of work for nothing. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
Change of hosts.txt is a stupid idea! There should be another way. On the contrary I think both ideas are resourceful and creative and a testimate to the better elements of our community working together. I also highly doubt anyone is being 'lazy', on the contrary most of the Baker Lab researchers and systems people are incredibly dedicated. Your frustration is understandable but misdirected, we're all on the same team here man :) |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
..., we're all on the same team here man :) No! That is wrong, they need us, we don't. They get money for the job. I spend money, for hardware and for power. I spend my spare time for maintenance of my systems to keep them always crunching. Later, in an hopefully no so far future, if the first results come to the market, pay i again to get the medicine or whatever. All what i get is useless credits, and the hope that i spend my money and time not in weapons. And all what they do in this case is "you should change you hosts file"!! There is no information at the Website, so that everyone can find out that there is a problem. There is not really a useful workaround wich works for everyone. And again: replace of srv1 in client_state.xml and client_state_prev.xml is no solution. restart of boinc and the previous srv1.. is back. |
Message boards :
Number crunching :
Stuck on Uploading
©2024 University of Washington
https://www.bakerlab.org