Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 52 · 53 · 54 · 55

AuthorMessage
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 80597 - Posted: 7 Sep 2016, 5:49:38 UTC - in response to Message 80596.  

We are in process of buying new hardware, which should help! In the meantime we are enabling the old hardware one piece at a time, allowing the system to catchup. Hopefully all will be ready in a day!

I have to ask what's going on atm.

Servers all seemed to be reset the other day, lots of tasks showing on the homepage and server status page now but little or nothing coming down, though at the same time lots of tasks seem to be in progress.

Meanwhile validation is anything up to a day and a half behind with some of my team.


I share your concern.
The server status page bears little resemblance to what we experience, even reporting tasks has become a lottery, validation is way behind and external statistics reporting is sporadic.

Has Rosetta@home become a victim of it's own success?

ID: 80597 · Rating: 0 · rate: Rate + / Rate - Report as offensive
bonami2

Send message
Joined: 13 Nov 15
Posts: 1
Credit: 2,707,056
RAC: 9
Message 80598 - Posted: 7 Sep 2016, 9:56:27 UTC

Oh so im not crazy did think something was wrong..

Got a fx 8300 a i7 2720qm a core 2 t4500 and a i7 4790k

Waiting for work unit.. gonna switch them to folding@home for now and switch back when i see that unit get sent.

:)
ID: 80598 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,922,931
RAC: 10,814
Message 80599 - Posted: 7 Sep 2016, 14:27:40 UTC - in response to Message 80597.  

We are in process of buying new hardware, which should help! In the meantime we are enabling the old hardware one piece at a time, allowing the system to catchup. Hopefully all will be ready in a day!


Better hurry up!! :-P

ID: 80599 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1960
Credit: 38,076,311
RAC: 6,958
Message 80601 - Posted: 7 Sep 2016, 15:48:14 UTC - in response to Message 80597.  

Good news. I hope adequate leeway is being allowed for the growth we've been seeing recently. 1 order of magnitude seems to be the minimum so that the upgrade can have any kind of longevity.

Is it possible to indicate what the recent stresses and failures have been recently and what the pathway is to relieve them, including a timescale. A little bit of explanation can go a long way in terms of the patience of users.
We are in process of buying new hardware, which should help! In the meantime we are enabling the old hardware one piece at a time, allowing the system to catchup. Hopefully all will be ready in a day!
I have to ask what's going on atm.

Servers all seemed to be reset the other day, lots of tasks showing on the homepage and server status page now but little or nothing coming down, though at the same time lots of tasks seem to be in progress.

Meanwhile validation is anything up to a day and a half behind with some of my team.

I share your concern.
The server status page bears little resemblance to what we experience, even reporting tasks has become a lottery, validation is way behind and external statistics reporting is sporadic.

Has Rosetta@home become a victim of it's own success?

ID: 80601 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,922,931
RAC: 10,814
Message 80603 - Posted: 8 Sep 2016, 15:22:31 UTC
Last modified: 8 Sep 2016, 15:25:20 UTC

Two favors/suggestions/advice:
- With new disks, please, do NOT clone the existing situation. A fresh installation (with updated OS&SW) is an opportunity....in for a penny, in for a pound!! (in Italy we say, "abbiamo fatto 30, facciamo 31")
- If you have "rush job" to do, "deviate" it to Ralph (is up and empty).
ID: 80603 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80604 - Posted: 8 Sep 2016, 16:26:46 UTC

Here's an update, sorry for the delay.

Our database server is running out of disk space. We had to reconfigure it which took a long time because it was over 140gigs, however it is operating at a very sluggish pace. Our project has been quite busy lately mainly due to Charity Engine providing 1000s of new hosts each day. This has been going on for quite some time and our database finally reached it's space limit with the current project configuration. We are working on a temporary solution since our full upgrade will take some time, on the order of months I am told.

It will take some time to settle as there are a lot of jobs (millions) that need to be processed. We plan to have another long period of down time when we transition to the temporary upgrade for the database server. Keith, Darwin, and Patrick, our sys admins, are working on getting it set up now.

So in the near future expect intermittent down time. The project status page may be incorrect and data dumps, and credit granting for failed jobs may also be delayed. Expect this to improve as we catch up on things.

On a side note, I was told that Charity Engine is going to detach from our project soon due to commercial interests/projects. This obviously will help our servers but unfortunately we'll see a huge drop in throughput. We greatly appreciate the massive computing they've provided us and hope to get their hosts crunching again for us in the future if possible.
ID: 80604 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1960
Credit: 38,076,311
RAC: 6,958
Message 80607 - Posted: 9 Sep 2016, 1:42:21 UTC - in response to Message 80604.  

Our project has been quite busy lately mainly due to Charity Engine providing 1000s of new hosts each day. This has been going on for quite some time and our database finally reached it's space limit with the current project configuration. We are working on a temporary solution since our full upgrade will take some time, on the order of months I am told.

Ok, so this tells me we shouldn't be shy to mainly run other projects for a good while to give some relief here. We can re-prioritise in the medium term

On a side note, I was told that Charity Engine is going to detach from our project soon due to commercial interests/projects. This obviously will help our servers but unfortunately we'll see a huge drop in throughput. We greatly appreciate the massive computing they've provided us and hope to get their hosts crunching again for us in the future if possible.

As long as the upgrade caters for CE's eventual return this will be a temporary hiccup in the grand scheme of things.

I'd also encourage you to take [VENETO] boboviz's advice and plan in the server software upgrade at the same time - there'll never be a better opportunity to get it done and it'll see off a whole rake of user complaints in one fell swoop.

The target time I'm going to assume is completion this side of Christmas.

On a side-note, is CASP complete now? Istm you can't have any reasonable expectation of timely or meaningful results now.
ID: 80607 · Rating: 0 · rate: Rate + / Rate - Report as offensive
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 80608 - Posted: 9 Sep 2016, 2:49:56 UTC - in response to Message 80604.  
Last modified: 9 Sep 2016, 2:50:33 UTC

Here's an update, sorry for the delay.

Our database server is running out of disk space. We had to reconfigure it which took a long time because it was over 140gigs, however it is operating at a very sluggish pace. Our project has been quite busy lately mainly due to Charity Engine providing 1000s of new hosts each day. This has been going on for quite some time and our database finally reached it's space limit with the current project configuration. We are working on a temporary solution since our full upgrade will take some time, on the order of months I am told.

At least it's good to know none of this could be foreseen. I mean, really?

It will take some time to settle as there are a lot of jobs (millions) that need to be processed. We plan to have another long period of down time when we transition to the temporary upgrade for the database server. Keith, Darwin, and Patrick, our sys admins, are working on getting it set up now.

So in the near future expect intermittent down time. The project status page may be incorrect and data dumps, and credit granting for failed jobs may also be delayed. Expect this to improve as we catch up on things.

On a side note, I was told that Charity Engine is going to detach from our project soon due to commercial interests/projects. This obviously will help our servers but unfortunately we'll see a huge drop in throughput. We greatly appreciate the massive computing they've provided us and hope to get their hosts crunching again for us in the future if possible.


Hope that was worthwhile for all the regular users that you lose who gave up and moved elsewhere do to no more work from the project.

this is a shame I think. lot's of volunteers basically getting the heave-ho due to whatever this 'charity engine' thing is. good luck.
ID: 80608 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1960
Credit: 38,076,311
RAC: 6,958
Message 80609 - Posted: 9 Sep 2016, 5:03:46 UTC - in response to Message 80608.  

Hope that was worthwhile for all the regular users that you lose who gave up and moved elsewhere do to no more work from the project.

this is a shame I think. lot's of volunteers basically getting the heave-ho due to whatever this 'charity engine' thing is. good luck.

In the period (2years+) since CE connected up, users increased from 400k to 1.2m, credits increased from a static 10mday to 50-70mday adding 50% to the lifetime credit of the project since 2004, so I think we can say it was worth it, especially seeing as the project has maxed out for the last several months.

Individuals being bitter about their lack of personal contribution massively misses the point at this time.
ID: 80609 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80610 - Posted: 9 Sep 2016, 5:08:45 UTC - in response to Message 80608.  


Hope that was worthwhile for all the regular users that you lose who gave up and moved elsewhere do to no more work from the project.

this is a shame I think. lot's of volunteers basically getting the heave-ho due to whatever this 'charity engine' thing is. good luck..


There's plenty of work now but we just reached a breaking point with our database server.
ID: 80610 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,922,931
RAC: 10,814
Message 80611 - Posted: 9 Sep 2016, 8:41:27 UTC - in response to Message 80610.  

There's plenty of work now but we just reached a breaking point with our database server.


I repeat myself: if you have some "urgent" work to do, use Ralph

We are working on a temporary solution since our full upgrade will take some time, on the order of months I am told.

Keith, Darwin, and Patrick, our sys admins, are working on getting it set up now


So, you have not jet the disks and they will arrive in the future (near??).
After that you have to install new os and configure it.
See you on Christmas.... :-P

ID: 80611 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80612 - Posted: 9 Sep 2016, 15:18:29 UTC - in response to Message 80609.  

Hope that was worthwhile for all the regular users that you lose who gave up and moved elsewhere do to no more work from the project.

this is a shame I think. lot's of volunteers basically getting the heave-ho due to whatever this 'charity engine' thing is. good luck.

In the period (2years+) since CE connected up, users increased from 400k to 1.2m, credits increased from a static 10mday to 50-70mday adding 50% to the lifetime credit of the project since 2004, so I think we can say it was worth it, especially seeing as the project has maxed out for the last several months.

Individuals being bitter about their lack of personal contribution massively misses the point at this time.


It should be pointed out that "Charity Engine" is not something that R@h created, asked for, or planned on. It is another project that started on it's own and decided to divert surplus computing power to R@h.
Rosetta Moderator: Mod.Sense
ID: 80612 · Rating: 0 · rate: Rate + / Rate - Report as offensive
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 80613 - Posted: 9 Sep 2016, 17:15:39 UTC - in response to Message 80611.  
Last modified: 9 Sep 2016, 17:15:48 UTC

Thank boboviz,

I have some urgent stuff we need for a paper. I'll be submitting those to ralph!

There's plenty of work now but we just reached a breaking point with our database server.


I repeat myself: if you have some "urgent" work to do, use Ralph

We are working on a temporary solution since our full upgrade will take some time, on the order of months I am told.

Keith, Darwin, and Patrick, our sys admins, are working on getting it set up now


So, you have not jet the disks and they will arrive in the future (near??).
After that you have to install new os and configure it.
See you on Christmas.... :-P
ID: 80613 · Rating: 0 · rate: Rate + / Rate - Report as offensive
No.15

Send message
Joined: 30 Dec 15
Posts: 7
Credit: 7,621,315
RAC: 0
Message 80615 - Posted: 10 Sep 2016, 0:25:55 UTC
Last modified: 10 Sep 2016, 0:27:07 UTC

I am glad to see this action and I will definitely move back to Rosetta once the upgrades are done. FWIW I never had a problem with CE crunching, the more work that gets done the better.

(me thinks this post went in the wrong thread. Sorry)
ID: 80615 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,639,916
RAC: 24
Message 80616 - Posted: 10 Sep 2016, 3:09:54 UTC

Thanks for the updates David and Serge, looking forward to see how this new hardware improves things next time CE decides to throw a wall of compute at us.

I echo others' chants that this is a great opportunity to update the server software since there will be extended downtime anyways, (if even just updating BOINC server software --> Step-by-step guide to running BOINC server Upgrades here: https://boinc.berkeley.edu/trac/wiki/ToolUpgrade) The aforementioned guide seems to imply that the upgrade can be done rather painlessly and is not destructive to existing data. Of course, would be best to test on Ralph first :)

Good luck! Thanks again for keeping us all in the loop.. Exciting times :)
**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80616 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,922,931
RAC: 10,814
Message 80618 - Posted: 10 Sep 2016, 9:21:59 UTC - in response to Message 80613.  

I have some urgent stuff we need for a paper. I'll be submitting those to ralph!


I see, i'm crunching on it. Hope this help
ID: 80618 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1843
Credit: 7,922,931
RAC: 10,814
Message 80619 - Posted: 10 Sep 2016, 9:31:57 UTC - in response to Message 80616.  

(if even just updating BOINC server software --> Step-by-step guide to running BOINC server Upgrades here: https://boinc.berkeley.edu/trac/wiki/ToolUpgrade) The aforementioned guide seems to imply that the upgrade can be done rather painlessly and is not destructive to existing data. Of course, would be best to test on Ralph first :)


I've posted this link 3 or 4 times and it's a great link if....boinc's server is regulary updated. If boinc server is updated every, for example, 5 years may be a problem, like admins of MilkyWay@Home's said:
Also I am still trying to get the BOINC libraries cross compiling for windows. There were a lot of changes in the BOINC libraries over the last few years and getting the kinks worked out with our build system has been a bumpy road.



Another request to admins: please, close this thread (over 1000 posts) and open a "parallel" thread with similar name (Problems and tech issue 2), cause is very difficult to open it.
ID: 80619 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80620 - Posted: 10 Sep 2016, 18:11:11 UTC

I'll do a full boinc server upgrade when we get our hardware.
ID: 80620 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80623 - Posted: 10 Sep 2016, 21:48:55 UTC
Last modified: 10 Sep 2016, 21:49:11 UTC

Link to next active technical issues thread.
Rosetta Moderator: Mod.Sense
ID: 80623 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 52 · 53 · 54 · 55

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org