Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 55 · Next
Author | Message |
---|---|
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,125,385 RAC: 3,471 |
Well, that there is something wrong with the servers, that's kind of the obvious part.I think this may be more like about the fact nobody from the projects cares to inform his donors about the problem. But what has been missing from this project (for years!) is basic communication that those responsible for those servers is aware that there is something wrong with them and at least some basic info that they will be working on it. I am working as sysadmin myself, and I know that it isn't THAT hard to be kept up to date of server status and in particular issues that require attention. There are plenty of Open Source tools out there for things like this. It's about noon on Monday, by this time someone should have been able to take a look at it and post a message "Hey guys, we're on it, it just might take a while" instead of just remaining incommunicado... Ralf |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected. |
bk.newton09 Send message Joined: 29 Jul 13 Posts: 1 Credit: 4,107,796 RAC: 0 |
I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected. One question I would have is how can any major university have IT systems without UPS capability and diesel or natural gas generators to keep the IT infrastructure resilient due to short term outages. Even if the outage was lengthy, they would be able to perform a soft shutdown to preserve all data and make the restart an easier task to complete once the mains are flowing. Am I off base with this? That infrastructure is not that expensive so should be in place to support computing capabilities. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected. Now that the server status shows everything up and over a million tasks available for download, I am wondering why the only machine of mine that received new tasks since the servers came back online was one that I just installed BOINC on today. All my other machines have dozens of completed tasks from over the weekend waiting to upload. While I can understand that there may be a bottleneck for uploads and downloads, the fact that the machine I just started up today downloaded new tasks right away has me worried. Will the system accept all the tasks my systems did over the weekend while the servers were down, or did the system crash make all of my downloaded tasks worthless? Am I going to have to abort all of these tasks stuck in my queue as “Uploading” before any new tasks download? I have a lot of time invested in these tasks, so I do not want to abort them unless absolutely necessary. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected. I tried the email address you provided, but the email was kicked back to me. I then posted the question on this forum. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected. I'm not sure. They surely are backed up by a UPS or generator system. Only 2 out of 6 servers were unresponsive when I checked this morning. During the weekend storm, all seemed fine. So I'm not sure exactly what happened. And I'm not sure why/how the FoldIt site went down due to the storm. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise. Ok, thank you! I'll wait and won't abort the completed tasks. Thank you for the reply! As for the email getting kicked back, I sent it to dekim@u.washington.edu, which is I believe what you spelled out in your forum note. I was watching the server status all day yesterday. Four of eight rah_make_work servers showed "Not Running", in addition to the file_deleter and db_purge servers were "Not Running." |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise. You were definitely more attentive than I! A couple of the webservers were not started up. I just did, so maybe that will get things moving. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise. Great! At the same time you were doing that, I started rebooting all my machines, and I watched as the "Uploading" status changed to "Ready to Report." Many thanks! |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,125,385 RAC: 3,471 |
Well, at least two hosts where able to upload again. Let's see how it all looks in a few hours... Ralf |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Uploads are timing out again. Tue 01 Sep 2015 20:54:29 AEST | | Project communication failed: attempting access to reference site Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Temporarily failed upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0: transient HTTP error Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Backing off 2 min 29 sec on upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0 Tue 01 Sep 2015 20:54:31 AEST | | Internet access OK - project servers may be temporarily down. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Uploads are timing out again. Tue 01 Sep 2015 20:54:29 AEST | | Project communication failed: attempting access to reference site Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Temporarily failed upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0: transient HTTP error Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Backing off 2 min 29 sec on upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0 Tue 01 Sep 2015 20:54:31 AEST | | Internet access OK - project servers may be temporarily down. |
zemanek Send message Joined: 17 Feb 10 Posts: 2 Credit: 76,470 RAC: 0 |
1.9.2015 13:40:00 | rosetta@home | Temporarily failed upload of FFD__aff6632f40081937d45cd9e76bc7c9df_abinitioDocking_15_08_09_04_44_localDocking_9_SAVE_ALL_OUT_299938_1_0_0: transient HTTP error |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 16,102 |
Uploads are timing out again. Ditto and the Server Status page isn't working either. Downloads still coming down though. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Two of our servers went down again. We are currently looking into it. Hold tight. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Two of our servers went down again. We are currently looking into it. Hold tight. Thanks for the heads up. I couldn't access the server status page to check, so that server must be down too. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 16,102 |
Two of our servers went down again. We are currently looking into it. Hold tight. Everything just went through for me - uploads and downloads - and the server status page is back (1 red - rah_make_work5 bsrv3 Not running) |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Two of our servers went down again. We are currently looking into it. Hold tight. All server status shows disabled except for the data-driven web pages, yet my completed tasks are being successfully uploaded. However, I am not receiving credit for any of the uploads. Will I receive credit for them later? |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,125,385 RAC: 3,471 |
Well, server status shows all green but for "file deleter".Two of our servers went down again. We are currently looking into it. Hold tight. The account info shows updated stats but it looks like the stats XML files aren't generated as none of the external stats sites are showing any updates for now... Ralf |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org