SERVER PROBLEMS.

Message boards : Number crunching : SERVER PROBLEMS.

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · Next

AuthorMessage
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 62606 - Posted: 29 Jul 2009, 23:55:52 UTC - in response to Message 62605.  

Why is it that when the server goes "down" the status on the home page does not change? is it an interal issue with the server that is not bad enough to trigger a status change or is it that the server has to be "offline" completely?

It's confusing to see in messages that the server is down but the server status page is all green.

any ideas?



The status page is cached and updates every 10 minutes or so. On the homepage under "Server Status", you will see either "Scheduler running" or "Scheduler disabled." This page is not cached so it should be up-to-date. It obviously doesn't give specifics about each server daemon, but it will tell you if the project/scheduler is down.
ID: 62606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62615 - Posted: 30 Jul 2009, 6:58:18 UTC

This one dosen't seem to want to go home, all the failed ones went back no problem

but this result isn't moving, it's been trying for hours.

Thu 30 Jul 2009 16:52:14 EST||Project communication failed: attempting access to reference site
Thu 30 Jul 2009 16:52:14 EST|rosetta@home|Temporarily failed upload of lr8_B_seq_score12_ss5.0_rlbd_1l6p_IGNORE_THE_REST_DECOY_14598_298_0_0: HTTP error
Thu 30 Jul 2009 16:52:14 EST|rosetta@home|Backing off 3 hr 53 min 24 sec on upload of lr8_B_seq_score12_ss5.0_rlbd_1l6p_IGNORE_THE_REST_DECOY_14598_298_0_0
Thu 30 Jul 2009 16:52:16 EST||Internet access OK - project servers may be temporarily down.


ID: 62615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 62618 - Posted: 30 Jul 2009, 9:56:24 UTC - in response to Message 62606.  

Why is it that when the server goes "down" the status on the home page does not change? is it an interal issue with the server that is not bad enough to trigger a status change or is it that the server has to be "offline" completely?

It's confusing to see in messages that the server is down but the server status page is all green.

any ideas?



The status page is cached and updates every 10 minutes or so. On the homepage under "Server Status", you will see either "Scheduler running" or "Scheduler disabled." This page is not cached so it should be up-to-date. It obviously doesn't give specifics about each server daemon, but it will tell you if the project/scheduler is down.



ahh thanks..i was referring to the server status page in my post.
so then what has been happening recently is a dameon is going down and causes the error messages we have been seeing. but the "whole" server is still running which is why we still see green on the server status page?
ID: 62618 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 62707 - Posted: 1 Aug 2009, 23:46:29 UTC
Last modified: 1 Aug 2009, 23:47:33 UTC

come on guys...this is getting old.....
do the lr tasks have bugs in them?
i havent had much luck with them lately.

8/2/2009 1:39:29 AM|rosetta@home|Started download of relax_options_lr10_seq_score12_mtyka
8/2/2009 1:43:10 AM||Project communication failed: attempting access to reference site
8/2/2009 1:43:10 AM|rosetta@home|Temporarily failed download of lr5_5cro.out.zip: HTTP error
8/2/2009 1:43:10 AM|rosetta@home|Started download of boinc_rb1_1a19.pdb
8/2/2009 1:43:12 AM||Internet access OK - project servers may be temporarily down.
8/2/2009 1:44:31 AM||Project communication failed: attempting access to reference site
8/2/2009 1:44:31 AM|rosetta@home|Temporarily failed download of relax_options_lr10_seq_score12_mtyka: HTTP error
8/2/2009 1:44:31 AM|rosetta@home|Started download of lr10_1a19.out.zip
8/2/2009 1:44:32 AM||Internet access OK - project servers may be temporarily down.
ID: 62707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 62708 - Posted: 2 Aug 2009, 0:12:31 UTC - in response to Message 62707.  

come on guys...this is getting old.....
do the lr tasks have bugs in them?
i havent had much luck with them lately.


I have completed 1 lr task under 1.87 and another under 1.88. Both came out valid.

I have a third lr underway on 1.90 and it seems to be going fine so far.
ID: 62708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jay

Send message
Joined: 12 Jan 08
Posts: 20
Credit: 195,801
RAC: 0
Message 62751 - Posted: 3 Aug 2009, 21:09:56 UTC

Greetings!

I am having intermittent problems uploading.
Some times the WU go on the first time.
Other times - it may take several retries.

I like in Florda and use a DSL and *assumed* that the network was not at fault.

I would like to test this with a ping.

First of all, what is the address of theupload server?
boinc.bakerlab.org
or
srv4.bakerlab.org

I tried a short test on each:




PING boinc.bakerlab.org (140.142.20.103): 100 data bytes
108 bytes from 140.142.20.103: icmp_seq=0 ttl=45 time=93 ms
108 bytes from 140.142.20.103: icmp_seq=1 ttl=45 time=93 ms
108 bytes from 140.142.20.103: icmp_seq=2 ttl=45 time=93 ms

----boinc.bakerlab.org PING Statistics----
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip (ms) min/avg/max/med = 93/93/93/93

PING srv4.bakerlab.org (140.142.20.112): 100 data bytes
108 bytes from 140.142.20.112: icmp_seq=0 ttl=45 time=93 ms
108 bytes from 140.142.20.112: icmp_seq=1 ttl=45 time=203 ms
108 bytes from 140.142.20.112: icmp_seq=2 ttl=45 time=406 ms

----srv4.bakerlab.org PING Statistics----
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip (ms) min/avg/max/med = 93/234/406/203





I turned on debug on BOINC file transfer.
Here is what it said:


8/3/2009 4:32:59 PM rosetta@home Scheduler request completed
8/3/2009 4:35:50 PM Project communication failed: attempting access to reference site
8/3/2009 4:35:50 PM rosetta@home [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval -184
8/3/2009 4:35:50 PM rosetta@home [file_xfer_debug] file transfer status -184
8/3/2009 4:35:50 PM rosetta@home Temporarily failed upload of lr8_newhb_run02_rlbn_1t2i_IGNORE_THE_REST_NATIVE_NOCON_14611_66_0_0: HTTP error
8/3/2009 4:35:50 PM rosetta@home Backing off 2 min 27 sec on upload of lr8_newhb_run02_rlbn_1t2i_IGNORE_THE_REST_NATIVE_NOCON_14611_66_0_0
8/3/2009 4:35:51 PM Internet access OK - project servers may be temporarily down.
8/3/2009 4:38:18 PM rosetta@home Started upload of lr8_newhb_run02_rlbn_1t2i_IGNORE_THE_REST_NATIVE_NOCON_14611_66_0_0
8/3/2009 4:38:18 PM rosetta@home [file_xfer_debug] URL: http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler
8/3/2009 4:38:39 PM Project communication failed: attempting access to reference site
8/3/2009 4:38:39 PM rosetta@home [file_xfer_debug] FILE_XFER_SET::poll(): http op done; retval -107
8/3/2009 4:38:39 PM rosetta@home [file_xfer_debug] file transfer status -107
8/3/2009 4:38:39 PM rosetta@home Temporarily failed upload of lr8_newhb_run02_rlbn_1t2i_IGNORE_THE_REST_NATIVE_NOCON_14611_66_0_0: connect() failed
8/3/2009 4:38:39 PM rosetta@home Backing off 6 min 22 sec on upload of lr8_newhb_run02_rlbn_1t2i_IGNORE_THE_REST_NATIVE_NOCON_14611_66_0_0
8/3/2009 4:38:41 PM Internet access OK - project servers may be temporarily down.



Suggestions??

As I take the 30 minutes to look at other posts and write this up, I find
that the WU uploaded..

Still, I would like some insight on the process....

Many thanks!!
Jay
ID: 62751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 62759 - Posted: 4 Aug 2009, 2:33:01 UTC
Last modified: 4 Aug 2009, 2:38:10 UTC

Yes Jay srv4 is the upload server, and as you can see from your PING, it's responsivness is rather inconsistent. It is under heavy load and is still recovering from recent difficulties.

The BOINC software that you run on your machine is all set up for these eventualities. It will retry sending the file for you.
Rosetta Moderator: Mod.Sense
ID: 62759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile jay

Send message
Joined: 12 Jan 08
Posts: 20
Credit: 195,801
RAC: 0
Message 62762 - Posted: 4 Aug 2009, 7:29:04 UTC - in response to Message 62759.  


Thank you Mod.Sense for the informative response!!
I appreciate the time you take giving answers...
I don't always stay connected and usually connect;
enable network activity; get some coffee; do updates;
disable network activity; and disconnect....

Thanks again,
Jay
ID: 62762 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,188,338
RAC: 11,005
Message 62765 - Posted: 4 Aug 2009, 13:41:53 UTC - in response to Message 62759.  

Yes Jay srv4 is the upload server, and as you can see from your PING, it's responsiveness is rather inconsistent. It is under heavy load and is still recovering from recent difficulties.

I'm not sure if I just got lucky, but I'd guess the servers finally caught up round about the time you posted. No error messages for upload or download in the last 15 hours and no re-tries.
ID: 62765 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62788 - Posted: 6 Aug 2009, 1:11:41 UTC
Last modified: 6 Aug 2009, 1:13:27 UTC

I have copied these from the mini 1.90 thread so they might be seen by admin's.

As some people are still having problems.

Message 62781 - Posted 5 Aug 2009 16:52:59 UTC - in response to Message ID 62773.

05/08/2009 17:06:15|rosetta@home|Started download of boinc_rb1_1aiu.pdb
05/08/2009 17:06:17|rosetta@home|Finished download of boinc_rb1_1aiu.pdb
05/08/2009 17:06:17|rosetta@home|Started download of lr8_1aiu.out.zip
05/08/2009 17:06:45|rosetta@home|Finished download of minirosetta_database_rev31588.zip
05/08/2009 17:06:45|rosetta@home|Started download of boinc_rb1_1acf.pdb
05/08/2009 17:06:45|rosetta@home|[error] Signature verification failed for minirosetta_database_rev31588.zip
05/08/2009 17:06:45|rosetta@home|[error] Checksum or signature error for minirosetta_database_rev31588.zip




How do I fix these errors? :(

https://boinc.bakerlab.org/rosetta/results.php?hostid=986605&offset=20

================================================================================

Message 62787 - Posted 6 Aug 2009 0:22:00 UTC - in response to Message ID 62786.

05/08/2009 17:06:15|rosetta@home|Started download of boinc_rb1_1aiu.pdb
05/08/2009 17:06:17|rosetta@home|Finished download of boinc_rb1_1aiu.pdb
05/08/2009 17:06:17|rosetta@home|Started download of lr8_1aiu.out.zip
05/08/2009 17:06:45|rosetta@home|Finished download of minirosetta_database_rev31588.zip
05/08/2009 17:06:45|rosetta@home|Started download of boinc_rb1_1acf.pdb
05/08/2009 17:06:45|rosetta@home|[error] Signature verification failed for minirosetta_database_rev31588.zip
05/08/2009 17:06:45|rosetta@home|[error] Checksum or signature error for minirosetta_database_rev31588.zip




How do I fix these errors? :(

https://boinc.bakerlab.org/rosetta/results.php?hostid=986605&offset=20



You could try to reset project if you have not tried already, could not hurt!



I have just added Rosetta to the list on two boxes, one Windoze, one Linux and have this error on both. The project hasn't even started, not much to reset. Running BOINC 5.10.45
ID: 62788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62806 - Posted: 6 Aug 2009, 21:59:17 UTC

It looks like the system has slowed to a crawl once again

with the release of the new app!

My three are all having problems WITH U/L & D/L.



ID: 62806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 62808 - Posted: 6 Aug 2009, 22:27:06 UTC
Last modified: 6 Aug 2009, 22:28:59 UTC

8/7/2009 12:25:33 AM rosetta@home update requested by user
8/7/2009 12:25:36 AM rosetta@home Sending scheduler request: Requested by user.
8/7/2009 12:25:36 AM rosetta@home Reporting 2 completed tasks, not requesting new tasks
8/7/2009 12:25:58 AM Project communication failed: attempting access to reference site
8/7/2009 12:25:59 AM Internet access OK - project servers may be temporarily down.
8/7/2009 12:26:01 AM rosetta@home Scheduler request failed: Failure when receiving data from the peer

then:

8/7/2009 12:28:05 AM rosetta@home update requested by user
8/7/2009 12:28:06 AM rosetta@home Sending scheduler request: Requested by user.
8/7/2009 12:28:06 AM rosetta@home Reporting 2 completed tasks, not requesting new tasks
8/7/2009 12:28:41 AM rosetta@home Scheduler request completed
ID: 62808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62888 - Posted: 11 Aug 2009, 21:47:20 UTC

There seems to be a few of us having this problem.

Wed 12 Aug 2009 07:42:34 EST|rosetta@home|Temporarily failed upload of abinitio_withrelax_homfrag__no_native_1uzc_0001A__SAVE_ALL_OUT_14620_1950_0_0: HTTP error
Wed 12 Aug 2009 07:43:38 EST|rosetta@home|Temporarily failed upload of abinitio_withrelax_homfrag__no_native_1uzc_0001A__SAVE_ALL_OUT_14620_2922_0_0: HTTP error


ID: 62888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62900 - Posted: 12 Aug 2009, 7:38:55 UTC

Hi.

Is there a problem with the validator now, i have a couple of tasks that have

been sitting their for hours waiting, is something else broken?

B.T.W. Thanks for fixing the other problem with the U/L's.

ID: 62900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62979 - Posted: 20 Aug 2009, 2:45:11 UTC

Hi.

Is there a problem again, l looked at the server page and there is a lot of red

their. nothing on the front page about any work to be done on the servers!

ID: 62979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63002 - Posted: 21 Aug 2009, 22:57:46 UTC
Last modified: 21 Aug 2009, 22:58:14 UTC

It looks like you are missing the BD file for some or all tasks with the new app!

ERROR: in::file::zip minirosetta_database.zip does not exist!
ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 97
BOINC:: Error reading and gzipping output datafile: default.out

What happened to not releasing new things just before a weekend?
ID: 63002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 173
Message 63003 - Posted: 21 Aug 2009, 23:16:23 UTC

According to the front page the last time they brought a new application was Aug6 & there's no mention of a new application over on RALPH either
Have a crunching good day!!
ID: 63003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 63007 - Posted: 22 Aug 2009, 1:52:14 UTC

Aug 21, 2009
The project is offline for the moment as we deal with an error in the recent application upate. Hopefully we will have the project back online within the next hour or so. Sorry for any inconvenience.
ID: 63007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63008 - Posted: 22 Aug 2009, 3:00:07 UTC

A shiny new app for all 1.97, plus a few files!

sorry, couldn't help myself ;)


ID: 63008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63182 - Posted: 6 Sep 2009, 21:55:03 UTC

Morning all.

Is there a problem, none of my rigs are getting any work this morning i see

some others are having the same issues.

Mon 07 Sep 2009 07:24:25 EST|rosetta@home|Sending scheduler request: To fetch work. Requesting 6794 seconds of work, reporting 0 completed tasks
Mon 07 Sep 2009 07:24:53 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks




ID: 63182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · Next

Message boards : Number crunching : SERVER PROBLEMS.



©2024 University of Washington
https://www.bakerlab.org