SERVER PROBLEMS.

Message boards : Number crunching : SERVER PROBLEMS.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58501 - Posted: 4 Jan 2009, 21:03:13 UTC - in response to Message 58497.  

Do you mean that there's no input to the rah_make_work* servers

That puts it well Sid, yes. I just didn't feel most people would understand a statement like that. Also, depending on what the Project Team has in progress, they may need to test new tasks on Ralph first as well, before releasing new work here.

All the tasks you are seeing come out are just the WUs that are being reissued due to errors by the first party, or expiring deadlines. This is why there are so few.

Amazed that an untechnical person like me got that right. But I'm a process engineer so I guess I appreciate the process even if I'm unqualified in the detail.

So this takes the problem to a whole other level. Best to just wait.

I'm just reminded of a remark made by a manager of mine a few years ago:
"There's 24 hours in a day and it's not essential to sleep or take holidays in the short-term..."

He wasn't popular either...



interesting quote for sure....
ID: 58501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,098,930
RAC: 0
Message 58502 - Posted: 4 Jan 2009, 21:06:49 UTC

Am crunching for Folding@home and World Community Grid (Human Proteome Folding phase 2) in the meantime.. Hope this delay is because they're working on a new improved version, or a new project, and not just some random hardware failure.
ID: 58502 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 58563 - Posted: 6 Jan 2009, 12:20:19 UTC

Hate to be the bearer of bad news but:
As of 6 Jan 2009 12:11:20 UTC
rah_make_work1 srv3 Not running
rah_make_work2 srv4 Not running
ID: 58563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58571 - Posted: 6 Jan 2009, 15:31:41 UTC - in response to Message 58563.  

same status As of 6 Jan 2009 15:25:37 UTC (updated every 10 minutes).
It is now 730am pacific time. hopefully the team will see this problem and get things running again in a few hours.


Hate to be the bearer of bad news but:
As of 6 Jan 2009 12:11:20 UTC
rah_make_work1 srv3 Not running
rah_make_work2 srv4 Not running

ID: 58571 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 58582 - Posted: 6 Jan 2009, 22:03:53 UTC
Last modified: 6 Jan 2009, 22:09:25 UTC

Hi.

It's all green but i'm still having problems!

Wed 07 Jan 2009 08:51:48 EST||Project communication failed: attempting access to reference site

Wed 07 Jan 2009 08:51:48 EST|rosetta@home|Temporarily failed upload of abinitio_norelax_homfrag_129_B_1vkkA_SAVE_ALL_OUT_4626_4788_0_0: HTTP error

Wed 07 Jan 2009 08:51:48 EST|rosetta@home|Backing off 40 min 0 sec on upload of abinitio_norelax_homfrag_129_B_1vkkA_SAVE_ALL_OUT_4626_4788_0_0

Wed 07 Jan 2009 08:51:50 EST||Internet access OK - project servers may be temporarily down.

Been getting this all morning.

pete.
ID: 58582 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58583 - Posted: 6 Jan 2009, 22:08:52 UTC - in response to Message 58582.  
Last modified: 6 Jan 2009, 22:12:26 UTC

Hi.

It's all green but i'm this having problems!

Wed 07 Jan 2009 08:51:48 EST||Project communication failed: attempting access to reference site

Wed 07 Jan 2009 08:51:48 EST|rosetta@home|Temporarily failed upload of abinitio_norelax_homfrag_129_B_1vkkA_SAVE_ALL_OUT_4626_4788_0_0: HTTP error

Wed 07 Jan 2009 08:51:48 EST|rosetta@home|Backing off 40 min 0 sec on upload of abinitio_norelax_homfrag_129_B_1vkkA_SAVE_ALL_OUT_4626_4788_0_0

Wed 07 Jan 2009 08:51:50 EST||Internet access OK - project servers may be temporarily down.

Been getting this all morning.

pete.

just let it ride, or suspend communication for a few hours and try again.
i see the same thing on my messages as well.

home page had this note,

Sunday, January 06, 2008 9:00 AM
Our database server lost power accidentally as work was being done on the rack. We are back up and running now.


Now it looks like there is something else going on, there are Ready to send 73,580 tasks on the server and the main page says [ Scheduler running ] Queued: 45,698. So there must be some sort of comms issue now.
ID: 58583 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 58584 - Posted: 6 Jan 2009, 22:14:59 UTC

I'm currently seeing:

Tue 06 Jan 2009 05:02:47 PM EST|rosetta@home|Sending scheduler request to http://srv4.bakerlab.org/rosetta_cgi/cgi
Tue 06 Jan 2009 05:02:47 PM EST|rosetta@home|Reason: To fetch work
Tue 06 Jan 2009 05:02:47 PM EST|rosetta@home|Requesting 191267 seconds of new work
Tue 06 Jan 2009 05:04:47 PM EST||Attempting to communicate with [srv4.bakerlab.org] timed out
Tue 06 Jan 2009 05:04:49 PM EST|rosetta@home|Scheduler request to http://srv4.bakerlab.org/rosetta_cgi/cgi failed with a return value of -182
Tue 06 Jan 2009 05:04:49 PM EST|rosetta@home|No schedulers responded

ID: 58584 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58586 - Posted: 6 Jan 2009, 23:34:45 UTC

Just a repeat of the weekends failure.

Sit back and wait it out..that's all there is to do.
When they get the problem fixed then you will get work.
Boinc manager will keep delaying the contact time every time it encounters a error. I have had failures all night (european time) so its backed off to 2hrs before trying to communicate again.
ID: 58586 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 58588 - Posted: 7 Jan 2009, 0:37:34 UTC - in response to Message 58586.  

Just a repeat of the weekends failure.


No, the symptoms are different. According to the status page, there is now plenty of work and everything is green, but my machines are getting "Scheduler request to http://srv4.bakerlab.org/rosetta_cgi/cgi failed with a return value of -182" errors when they try to download work.

Sit back and wait it out..that's all there is to do.


On the home page they say "We are back up and running now." It sounds like they may think things are running properly, in which case some people need to post and tell them things are not running properly.
ID: 58588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,003,213
RAC: 82
Message 58589 - Posted: 7 Jan 2009, 0:40:19 UTC

the external interface to our upload/download servers is currently down. Keith (our sys admin guru) is aware of the issue but cannot fix it until he gets in tomorrow. It is related to the database server issue we had. The connection must have been pulled accidentally when work was being done on the rack.

Sorry for any inconvenience. I imagine our servers are going to be quite busy tomorrow.
ID: 58589 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gray Handcock

Send message
Joined: 26 Sep 05
Posts: 14
Credit: 1,635,867
RAC: 0
Message 58596 - Posted: 7 Jan 2009, 8:48:38 UTC

Hi

10:47 am here - still getting this error:

2009/01/07 10:36:00 AM|rosetta@home|Sending scheduler request: Requested by user. Requesting 259201 seconds of work, reporting 0 completed tasks
2009/01/07 10:36:08 AM||Project communication failed: attempting access to reference site
2009/01/07 10:36:10 AM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/07 10:36:11 AM||Internet access OK - project servers may be temporarily down.

I assume the issue is still to be worked on ?
ID: 58596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 58603 - Posted: 7 Jan 2009, 10:29:26 UTC

I've now gotten some work (although the servers do indeed seem pretty busy at the moment).

It's clear someone must have been working very hard into the wee hours of the morning to get things working. Thanks.
ID: 58603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58604 - Posted: 7 Jan 2009, 10:31:03 UTC - in response to Message 58596.  


Guys the project is based in Seattle and that is -8 hrs GMT/UTC time.
Current time in Belgium is 11.28 am and back in Seattle it is 2.28am.
There won't be a resolution to the problem described below for at least another 6 hours minimum and perhaps longer. I would set your project status to 'no new tasks' for now to save your self a long list of failure messages. Take that setting off later today once they have solved the problem listed in DEK's message.



Hi

10:47 am here - still getting this error:

2009/01/07 10:36:00 AM|rosetta@home|Sending scheduler request: Requested by user. Requesting 259201 seconds of work, reporting 0 completed tasks
2009/01/07 10:36:08 AM||Project communication failed: attempting access to reference site
2009/01/07 10:36:10 AM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/07 10:36:11 AM||Internet access OK - project servers may be temporarily down.

I assume the issue is still to be worked on ?

ID: 58604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58637 - Posted: 7 Jan 2009, 18:59:59 UTC
Last modified: 7 Jan 2009, 19:00:08 UTC

system is back to normal.
i just restarted communications and got 6 new tasks.
ID: 58637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gray Handcock

Send message
Joined: 26 Sep 05
Posts: 14
Credit: 1,635,867
RAC: 0
Message 58651 - Posted: 7 Jan 2009, 21:07:37 UTC

Guess it will still be a while...

---------------------------------------------
2009/01/07 10:49:28 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 30240 seconds of work, reporting 0 completed tasks
2009/01/07 10:49:31 PM||Project communication failed: attempting access to reference site
2009/01/07 10:49:33 PM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/07 10:49:34 PM||Internet access OK - project servers may be temporarily down.
2009/01/07 11:05:33 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 30240 seconds of work, reporting 0 completed tasks
2009/01/07 11:05:38 PM||Project communication failed: attempting access to reference site
2009/01/07 11:05:39 PM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/07 11:05:41 PM||Internet access OK - project servers may be temporarily down.
----------------------------------------------

cheers
ID: 58651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gray Handcock

Send message
Joined: 26 Sep 05
Posts: 14
Credit: 1,635,867
RAC: 0
Message 58652 - Posted: 7 Jan 2009, 21:10:08 UTC

...and as soon as I posted the message before this the downloads started !!

Thanks !
ID: 58652 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gray Handcock

Send message
Joined: 26 Sep 05
Posts: 14
Credit: 1,635,867
RAC: 0
Message 58695 - Posted: 9 Jan 2009, 7:19:13 UTC

are there still server issues ?

-----------------------------------------
2009/01/09 08:30:08 AM|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
2009/01/09 08:30:13 AM||Project communication failed: attempting access to reference site
2009/01/09 08:30:13 AM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/09 08:30:16 AM||Internet access OK - project servers may be temporarily down.
------------------------------------------

thanks
ID: 58695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 4871
Credit: 4,176,261
RAC: 2,313
Message 58696 - Posted: 9 Jan 2009, 9:22:47 UTC - in response to Message 58695.  
Last modified: 9 Jan 2009, 9:24:31 UTC

are there still server issues ?

-----------------------------------------
2009/01/09 08:30:08 AM|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
2009/01/09 08:30:13 AM||Project communication failed: attempting access to reference site
2009/01/09 08:30:13 AM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/09 08:30:16 AM||Internet access OK - project servers may be temporarily down.
------------------------------------------

thanks



yes.. looks like the make work servers are offline or out of jobs to send, we are stuck with no new work for at least 7 hours. I suggest the next time you get connected you download about 3-4 days of extra work. Then you can survive this problem if it happens again.

As of 9 Jan 2009 9:11:47 UTC (updated every 10 minutes)
rah_make_work1 srv3 Not running
rah_make_work2 srv4 Not running
Ready to send 74

main page shows 0 tasks ready to send
ID: 58696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 58705 - Posted: 9 Jan 2009, 20:53:51 UTC

Hi.

I just two of these out of four D/L's.

Sat 10 Jan 2009 07:46:04 EST|rosetta@home|[error] Checksum or signature error for homfragments_2ci2.zip

Sat 10 Jan 2009 07:46:37 EST|rosetta@home|[error] Checksum or signature error for homfragments_2d4f.zip

pete.


ID: 58705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ComfortablyNumb

Send message
Joined: 6 Jul 07
Posts: 8
Credit: 658,196
RAC: 0
Message 58712 - Posted: 10 Jan 2009, 1:47:03 UTC - in response to Message 58696.  

are there still server issues ?

-----------------------------------------
2009/01/09 08:30:08 AM|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
2009/01/09 08:30:13 AM||Project communication failed: attempting access to reference site
2009/01/09 08:30:13 AM|rosetta@home|Scheduler request failed: Transferred a partial file
2009/01/09 08:30:16 AM||Internet access OK - project servers may be temporarily down.
------------------------------------------

thanks



yes.. looks like the make work servers are offline or out of jobs to send, we are stuck with no new work for at least 7 hours. I suggest the next time you get connected you download about 3-4 days of extra work. Then you can survive this problem if it happens again.

As of 9 Jan 2009 9:11:47 UTC (updated every 10 minutes)
rah_make_work1 srv3 Not running
rah_make_work2 srv4 Not running
Ready to send 74

main page shows 0 tasks ready to send

I've tried to download 3-4 days worth of work before. Both times my pc crashed(In the last week). Couldn't even use asr to recover. Had to reformat and start over.

I have 3 pages of wu's, that my pc will not do. I'm sure they'll reassign them to other people, eventually.
ID: 58712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 12 · Next

Message boards : Number crunching : SERVER PROBLEMS.



©2020 University of Washington
https://www.bakerlab.org