Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 55 · Next

AuthorMessage
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 4,991,034
RAC: 879
Message 78641 - Posted: 31 Aug 2015, 18:27:27 UTC - in response to Message 78634.  

I think this may be more like about the fact nobody from the projects cares to inform his donors about the problem.

The server status page already informs everybody, who cares to look at it, that something is wrong with the servers (specially since it's not even updating anymore). That's what this page is done for, to inform us about the servers.

What could be improved, are the messages from servers. Curretly they just say "No work sent". Yeah, I can see that from the previous line "Scheduler request completed: got 0 new tasks". So instead of this completely useless message, they should send "Project has no tasks available", than I even don't need to look at the SSP to find out, why I'm not getting any work.
Well, that there is something wrong with the servers, that's kind of the obvious part.

But what has been missing from this project (for years!) is basic communication that those responsible for those servers is aware that there is something wrong with them and at least some basic info that they will be working on it.
I am working as sysadmin myself, and I know that it isn't THAT hard to be kept up to date of server status and in particular issues that require attention. There are plenty of Open Source tools out there for things like this.

It's about noon on Monday, by this time someone should have been able to take a look at it and post a message "Hey guys, we're on it, it just might take a while" instead of just remaining incommunicado...

Ralf
ID: 78641 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 78643 - Posted: 31 Aug 2015, 18:48:52 UTC

I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected.
ID: 78643 · Rating: 0 · rate: Rate + / Rate - Report as offensive
bk.newton09

Send message
Joined: 29 Jul 13
Posts: 1
Credit: 4,107,796
RAC: 0
Message 78646 - Posted: 31 Aug 2015, 21:18:15 UTC - in response to Message 78643.  

I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected.


One question I would have is how can any major university have IT systems without UPS capability and diesel or natural gas generators to keep the IT infrastructure resilient due to short term outages. Even if the outage was lengthy, they would be able to perform a soft shutdown to preserve all data and make the restart an easier task to complete once the mains are flowing.

Am I off base with this? That infrastructure is not that expensive so should be in place to support computing capabilities.
ID: 78646 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78647 - Posted: 31 Aug 2015, 21:27:21 UTC - in response to Message 78643.  
Last modified: 31 Aug 2015, 21:27:59 UTC

I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected.


Now that the server status shows everything up and over a million tasks available for download, I am wondering why the only machine of mine that received new tasks since the servers came back online was one that I just installed BOINC on today.

All my other machines have dozens of completed tasks from over the weekend waiting to upload. While I can understand that there may be a bottleneck for uploads and downloads, the fact that the machine I just started up today downloaded new tasks right away has me worried.

Will the system accept all the tasks my systems did over the weekend while the servers were down, or did the system crash make all of my downloaded tasks worthless? Am I going to have to abort all of these tasks stuck in my queue as “Uploading” before any new tasks download?

I have a lot of time invested in these tasks, so I do not want to abort them unless absolutely necessary.
ID: 78647 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78649 - Posted: 31 Aug 2015, 21:40:30 UTC - in response to Message 78643.  

I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected.



I tried the email address you provided, but the email was kicked back to me. I then posted the question on this forum.
ID: 78649 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 78650 - Posted: 31 Aug 2015, 21:49:53 UTC

Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise.
ID: 78650 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 78651 - Posted: 31 Aug 2015, 21:54:48 UTC - in response to Message 78646.  

I just noticed this. If any of you see issues with our servers etc, please feel free to email me directly at dekim at u dot washington dot edu. I sent a trouble ticket to our sys admins and am waiting for a response. It looks like a couple of our servers are unresponsive and is highly likely due to the power outage we had this weekend here in Seattle due to wind storms. The lab's FoldIt project has also been affected.


One question I would have is how can any major university have IT systems without UPS capability and diesel or natural gas generators to keep the IT infrastructure resilient due to short term outages. Even if the outage was lengthy, they would be able to perform a soft shutdown to preserve all data and make the restart an easier task to complete once the mains are flowing.

Am I off base with this? That infrastructure is not that expensive so should be in place to support computing capabilities.



I'm not sure. They surely are backed up by a UPS or generator system. Only 2 out of 6 servers were unresponsive when I checked this morning. During the weekend storm, all seemed fine. So I'm not sure exactly what happened. And I'm not sure why/how the FoldIt site went down due to the storm.

ID: 78651 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78652 - Posted: 31 Aug 2015, 22:05:44 UTC - in response to Message 78650.  

Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise.



Ok, thank you! I'll wait and won't abort the completed tasks. Thank you for the reply!

As for the email getting kicked back, I sent it to dekim@u.washington.edu, which is I believe what you spelled out in your forum note.

I was watching the server status all day yesterday. Four of eight rah_make_work servers showed "Not Running", in addition to the file_deleter and db_purge servers were "Not Running."
ID: 78652 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 78653 - Posted: 31 Aug 2015, 22:08:20 UTC - in response to Message 78652.  

Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise.



Ok, thank you! I'll wait and won't abort the completed tasks. Thank you for the reply!

As for the email getting kicked back, I sent it to dekim@u.washington.edu, which is I believe what you spelled out in your forum note.

I was watching the server status all day yesterday. Four of eight rah_make_work servers showed "Not Running", in addition to the file_deleter and db_purge servers were "Not Running."



You were definitely more attentive than I! A couple of the webservers were not started up. I just did, so maybe that will get things moving.
ID: 78653 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78654 - Posted: 31 Aug 2015, 22:32:09 UTC - in response to Message 78653.  

Dusty, I'd wait and give it some time. The servers should eventually take your results. I'm pretty sure it was a campus wide power outage we had and maybe the mail servers had issues also. It's odd that the email to me was bounced back otherwise.



Ok, thank you! I'll wait and won't abort the completed tasks. Thank you for the reply!

As for the email getting kicked back, I sent it to dekim@u.washington.edu, which is I believe what you spelled out in your forum note.

I was watching the server status all day yesterday. Four of eight rah_make_work servers showed "Not Running", in addition to the file_deleter and db_purge servers were "Not Running."



You were definitely more attentive than I! A couple of the webservers were not started up. I just did, so maybe that will get things moving.


Great! At the same time you were doing that, I started rebooting all my machines, and I watched as the "Uploading" status changed to "Ready to Report."

Many thanks!
ID: 78654 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 4,991,034
RAC: 879
Message 78655 - Posted: 31 Aug 2015, 22:36:59 UTC

Well, at least two hosts where able to upload again. Let's see how it all looks in a few hours...

Ralf
ID: 78655 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 78659 - Posted: 1 Sep 2015, 10:59:24 UTC

Uploads are timing out again.

Tue 01 Sep 2015 20:54:29 AEST | | Project communication failed: attempting access to reference site
Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Temporarily failed upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0: transient HTTP error
Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Backing off 2 min 29 sec on upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0
Tue 01 Sep 2015 20:54:31 AEST | | Internet access OK - project servers may be temporarily down.

ID: 78659 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 78660 - Posted: 1 Sep 2015, 10:59:27 UTC

Uploads are timing out again.

Tue 01 Sep 2015 20:54:29 AEST | | Project communication failed: attempting access to reference site
Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Temporarily failed upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0: transient HTTP error
Tue 01 Sep 2015 20:54:29 AEST | rosetta@home | Backing off 2 min 29 sec on upload of FFD__f0bc89e7fe6046579a922ef0133c6fb0_abinitioDocking_15_08_07_25_51_globalDocking_4_SAVE_ALL_OUT_300917_2_0_0
Tue 01 Sep 2015 20:54:31 AEST | | Internet access OK - project servers may be temporarily down.

ID: 78660 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zemanek

Send message
Joined: 17 Feb 10
Posts: 2
Credit: 76,470
RAC: 0
Message 78661 - Posted: 1 Sep 2015, 11:51:06 UTC

1.9.2015 13:40:00 | rosetta@home | Temporarily failed upload of FFD__aff6632f40081937d45cd9e76bc7c9df_abinitioDocking_15_08_09_04_44_localDocking_9_SAVE_ALL_OUT_299938_1_0_0: transient HTTP error
ID: 78661 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2073
Credit: 40,602,258
RAC: 5,342
Message 78664 - Posted: 1 Sep 2015, 15:38:59 UTC - in response to Message 78660.  

Uploads are timing out again.

Ditto and the Server Status page isn't working either.

Downloads still coming down though.
ID: 78664 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 78665 - Posted: 1 Sep 2015, 17:15:12 UTC

Two of our servers went down again. We are currently looking into it. Hold tight.
ID: 78665 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78666 - Posted: 1 Sep 2015, 17:20:26 UTC - in response to Message 78665.  

Two of our servers went down again. We are currently looking into it. Hold tight.


Thanks for the heads up. I couldn't access the server status page to check, so that server must be down too.
ID: 78666 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2073
Credit: 40,602,258
RAC: 5,342
Message 78669 - Posted: 1 Sep 2015, 18:59:36 UTC - in response to Message 78665.  

Two of our servers went down again. We are currently looking into it. Hold tight.

Everything just went through for me - uploads and downloads - and the server status page is back (1 red - rah_make_work5 bsrv3 Not running)
ID: 78669 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dusty

Send message
Joined: 1 Mar 08
Posts: 41
Credit: 2,667,354
RAC: 0
Message 78670 - Posted: 1 Sep 2015, 20:23:27 UTC - in response to Message 78665.  

Two of our servers went down again. We are currently looking into it. Hold tight.


All server status shows disabled except for the data-driven web pages, yet my completed tasks are being successfully uploaded. However, I am not receiving credit for any of the uploads. Will I receive credit for them later?
ID: 78670 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 4,991,034
RAC: 879
Message 78673 - Posted: 2 Sep 2015, 2:07:38 UTC - in response to Message 78670.  

Two of our servers went down again. We are currently looking into it. Hold tight.


All server status shows disabled except for the data-driven web pages, yet my completed tasks are being successfully uploaded. However, I am not receiving credit for any of the uploads. Will I receive credit for them later?
Well, server status shows all green but for "file deleter".

The account info shows updated stats but it looks like the stats XML files aren't generated as none of the external stats sites are showing any updates for now...

Ralf
ID: 78673 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org