Welcome Back!

Message boards : Number crunching : Welcome Back!

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Mike Francis
Avatar

Send message
Joined: 24 Nov 05
Posts: 8
Credit: 623,519
RAC: 0
Message 45753 - Posted: 9 Sep 2007, 1:13:11 UTC

Sep 08, 2007
Rosetta@home has experienced a horrendous hardware/fireware failure. We essentially lost the SAN partition upon which the project was running! The newest edition of our SAN hardware was shipped with a firmware revision that contained an insidious bug - one which caused the new SAN disks to vanish after roughly 45 days of service. We - or rather I (KEL) - apologize for the inconveinence, lost time and lost effort that you have endured during our outage. We know full well that your contribution hinges on the understanding that we make maximum use of your valuable resources - that we not waste your time, CPU cylces or good humor. We are planning to express our disappointment to our vendors in clear terms, specifically siting the importance of this project to our research effort. We'll keep you abreast of the outcome.

You People at the Project have been doing one heck of a GREAT JOB! We will see you when we see you.

Mike F,
ID: 45753 · Rating: 2 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keith E. Laidig
Volunteer moderator
Project developer
Avatar

Send message
Joined: 1 Jul 05
Posts: 154
Credit: 117,189,961
RAC: 0
Message 45754 - Posted: 9 Sep 2007, 1:19:26 UTC - in response to Message 45753.  

You People at the Project have been doing one heck of a GREAT JOB! We will see you when we see you.

Mike F,


I appreciate your patience but we're embarrassed.... I plan to pass along my discomfort to a couple of OEM vice presidents next week!

Post if anything doesn't work as you expect. -KEL

ID: 45754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BitSpit
Avatar

Send message
Joined: 5 Nov 05
Posts: 33
Credit: 4,147,344
RAC: 0
Message 45756 - Posted: 9 Sep 2007, 1:21:49 UTC

Can't do any file transfers yet. I spotted this in one of the message logs here:

aurora rosetta@home 9/8/2007 8:09:40 PM Message from server: Server can't open log file (../log_boinc/cgi.log)
ID: 45756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
michaelgwynn

Send message
Joined: 10 Apr 06
Posts: 8
Credit: 1,055,837
RAC: 0
Message 45757 - Posted: 9 Sep 2007, 1:44:44 UTC

same error that i've seen since yesterday, on all 8 pcs
ID: 45757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 119
Credit: 16,947,245
RAC: 7,519
Message 45758 - Posted: 9 Sep 2007, 2:02:08 UTC

KEL, thanks for your hard work. Give those OEM executives hell! Since you asked for it, here are two message sets. The first is an attempt to get new work, and the second is an attempt to report a completed work unit.

9/8/2007 9:54:19 PM|rosetta@home|Sending scheduler request: Requested by user
9/8/2007 9:54:19 PM|rosetta@home|Requesting 28580 seconds of new work
9/8/2007 9:54:24 PM|rosetta@home|Scheduler RPC succeeded
9/8/2007 9:54:24 PM|rosetta@home|Message from server: Server can't open log file (../log_boinc/cgi.log)
9/8/2007 9:54:24 PM|rosetta@home|Deferring communication for 1 hr 0 min 0 sec
9/8/2007 9:54:24 PM|rosetta@home|Reason: project is down


9/8/2007 9:56:47 PM|rosetta@home|[file_xfer] Started upload of file Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_9296_0_0
9/8/2007 9:56:48 PM|rosetta@home|[error] Error on file upload: can't open log file
9/8/2007 9:56:48 PM|rosetta@home|[file_xfer] Temporarily failed upload of Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_9296_0_0: transient upload error
9/8/2007 9:56:48 PM|rosetta@home|Backing off 2 hr 49 min 7 sec on upload of file Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_9296_0_0

ID: 45758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 45759 - Posted: 9 Sep 2007, 2:04:48 UTC

Same errors here. Thanks for your hard work on what I'm sure was a nightmare.
ID: 45759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 119
Credit: 16,947,245
RAC: 7,519
Message 45760 - Posted: 9 Sep 2007, 2:05:34 UTC

Sorry, my terminology was incorrect. The second set of messages was an attmpt to upload a completed work unit (not report).
ID: 45760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_JC

Send message
Joined: 7 Nov 05
Posts: 13
Credit: 3,179,160
RAC: 7,987
Message 45761 - Posted: 9 Sep 2007, 2:11:20 UTC - in response to Message 45760.  

9/8/2007 18.55.18|rosetta@home|Started download of boinc_mfr_aaPROF_09_05.200_v1_3.gz
9/8/2007 18.55.19||Network error: Transferred a partial file
9/8/2007 18.55.20|rosetta@home|Temporarily failed download of boinc_mfr_aaPROF_09_05.200_v1_3.gz: http error
9/8/2007 18.55.21|rosetta@home|Started download of boinc_mfr_aaPROF_09_05.200_v1_3.gz
9/8/2007 18.55.23||Network error: couldn't connect to server
9/8/2007 18.55.23|rosetta@home|Temporarily failed download of boinc_mfr_aaPROF_09_05.200_v1_3.gz: http error
9/8/2007 18.55.24|rosetta@home|Started download of boinc_mfr_aaPROF_09_05.200_v1_3.gz
9/8/2007 18.55.26||Network error: couldn't connect to server
9/8/2007 18.55.26|rosetta@home|Temporarily failed download of boinc_mfr_aaPROF_09_05.200_v1_3.gz: http error
9/8/2007 18.55.26|rosetta@home|Backing off 8 minutes and 38 seconds on download of file boinc_mfr_aaPROF_09_05.200_v1_3.gz
9/8/2007 18.55.32|rosetta@home|Started upload of truncbeat__BOINC_JUMPRELAX_BARCODE3_CONSTRAINT_DISULF-beat_-_2056_36555_0_0
9/8/2007 18.55.35|rosetta@home|Error on file upload: can't open log file


This is what I'm getting. The 'partial file' errors started on 9/3 for me.
ID: 45761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Francis
Avatar

Send message
Joined: 24 Nov 05
Posts: 8
Credit: 623,519
RAC: 0
Message 45763 - Posted: 9 Sep 2007, 2:38:46 UTC

Am also receiving log file errors on transfers.

9/8/2007 10:35:42 PM|rosetta@home|[error] Error on file upload: can't open log file
9/8/2007 10:35:42 PM|rosetta@home|[file_xfer] Temporarily failed upload of CNTRL_01ABRELAX_SAVE_ALL_OUT_-1opd_-_filters_1782_486037_0_0: transient upload error
9/8/2007 10:35:42 PM|rosetta@home|Backing off 1 hr 44 min 48 sec on upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1opd_-_filters_1782_486037_0_0
9/8/2007 10:35:51 PM|rosetta@home|[file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1opd_-_filters_1782_485795_0_0
9/8/2007 10:35:52 PM|rosetta@home|[error] Error on file upload: can't open log file
9/8/2007 10:35:52 PM|rosetta@home|[file_xfer] Temporarily failed upload of CNTRL_01ABRELAX_SAVE_ALL_OUT_-1opd_-_filters_1782_485795_0_0: transient upload error
9/8/2007 10:35:52 PM|rosetta@home|Backing off 3 hr 20 min 4 sec on upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1opd_-_filters_1782_485795_0_0
9/8/2007 10:35:59 PM|rosetta@home|[file_xfer] Started upload of file Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_26154_0_0
9/8/2007 10:36:00 PM|rosetta@home|[error] Error on file upload: can't open log file
9/8/2007 10:36:00 PM|rosetta@home|[file_xfer] Temporarily failed upload of Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_26154_0_0: transient upload error
9/8/2007 10:36:00 PM|rosetta@home|Backing off 2 hr 39 min 28 sec on upload of file Ly49A_BOINC_MFR_ABRELAX_PICKED_2065_26154_0_0

ID: 45763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Yank
Avatar

Send message
Joined: 18 Apr 06
Posts: 71
Credit: 1,752,514
RAC: 0
Message 45765 - Posted: 9 Sep 2007, 2:42:09 UTC - in response to Message 45759.  

Same errors here. Thanks for your hard work on what I'm sure was a nightmare.



Same errors. Nightmare for all of us especially for the working staff at Rosetta at Home.

ID: 45765 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scottatron

Send message
Joined: 20 Sep 05
Posts: 23
Credit: 591,959
RAC: 0
Message 45770 - Posted: 9 Sep 2007, 3:53:48 UTC

Give the project time, and uploads etc will all work again.
ID: 45770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 45771 - Posted: 9 Sep 2007, 4:03:26 UTC

Does this mean the project lost a lot of the information that we have crunched?
ID: 45771 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scottatron

Send message
Joined: 20 Sep 05
Posts: 23
Credit: 591,959
RAC: 0
Message 45772 - Posted: 9 Sep 2007, 4:07:20 UTC

I doubt it, the scientific results would be pumped into a database on a regular basis - and there would be backups done (Hopefully!)
ID: 45772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,725,963
RAC: 1,838
Message 45775 - Posted: 9 Sep 2007, 4:20:35 UTC - in response to Message 45765.  

Agreed, regarding giving the project time to recover -- I'm assuming that work unit deadlines are going to be pushed out so we don't have a bunch of over dues.



Same errors here. Thanks for your hard work on what I'm sure was a nightmare.



Same errors. Nightmare for all of us especially for the working staff at Rosetta at Home.


ID: 45775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 45781 - Posted: 9 Sep 2007, 5:39:48 UTC

Just restarted my oldgirls after some T.L.C. (Total Little Cleanout).

Getting the same problem, guess it'll take some time.

9/9/2007 3:32:22 PM|rosetta@home|Message from server: Server can't open log file (../log_boinc/cgi.log)

Pete.


ID: 45781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
larry1186

Send message
Joined: 18 Apr 06
Posts: 7
Credit: 329,257
RAC: 0
Message 45782 - Posted: 9 Sep 2007, 5:40:40 UTC

Well, its good to see things are getting fixed, Thanks for all the hard work even on a weekend!!

I'm getting "Can't open log file" and a "Project is down" so I assume we aren't out of the woods just yet...

Looking forward to a full recovery!
Don't get distracted by shiny objects.
ID: 45782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rohan

Send message
Joined: 19 Aug 07
Posts: 6
Credit: 75,560
RAC: 0
Message 45783 - Posted: 9 Sep 2007, 5:42:53 UTC - in response to Message 45781.  

Just restarted my oldgirls after some T.L.C. (Total Little Cleanout).

Getting the same problem, guess it'll take some time.

9/9/2007 3:32:22 PM|rosetta@home|Message from server: Server can't open log file (../log_boinc/cgi.log)

Pete.



Same erorrs, (cant open log file). Any ideas on how long until up and running again?
Thanks Rohan
ID: 45783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Emigdio Lopez Laburu

Send message
Joined: 25 Feb 06
Posts: 61
Credit: 40,240,061
RAC: 0
Message 45785 - Posted: 9 Sep 2007, 6:20:45 UTC

Hi.

Still not possible to send/receive work.
ID: 45785 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevea

Send message
Joined: 19 Dec 05
Posts: 50
Credit: 738,655
RAC: 0
Message 45786 - Posted: 9 Sep 2007, 7:17:57 UTC

Welcome back?

Still not uploading any wu's, giving a can't find file error.

I have 4 rigs that have not uploaded a single wu yet. Last contact was on Sept. 4th.

I can see us not getting credit for the work that was completed before the servers went down. If the file in question was on the server. And cannot be recovered.

I can see a lot of people not returning after finding how much credit other projects are giving out compared to rosetta.

I can say for sure one of my machines will not be returning as its getting over 100 ppd more on another project. Seems like the dreaded fair credit question will be brought back up after this fiasco has been resolved.

So much for the industry standard 99.9% uptime....for critical systems.
BETA = Bahhh

Way too many errors, killing both the credit & RAC.

And I still think the (New and Improved) credit system is not ready for prime time...
ID: 45786 · Rating: -2 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 45789 - Posted: 9 Sep 2007, 8:08:24 UTC

Stevea

It's frustrating isn't.

I have been frequently checking R@H and I noticed that when they first came back online that they had 4-Sep ~60 TF (I believe) on the main page so hopefully it means that everything was backed up, so at a guess I would say that there should be no problems with the credit, but then again we are talking about computers :)
ID: 45789 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Welcome Back!



©2020 University of Washington
https://www.bakerlab.org