Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 125 · 126 · 127 · 128 · 129 · 130 · 131 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102855 - Posted: 30 Sep 2021, 19:19:02 UTC - in response to Message 102852.  

Bryn Mawr - added the half_life, will sit back and see what happens.
Current WCG is dying in credits, guess I will have to pump that one up higher in %
Or just let things be until they have a chance to settle down- with 8 active projects, even with the changed half life value, i'd expect you're looking at a couple of weeks. One week bare minimum.
Then adjust Resource share as necessary.



Ok..will do.
It's 5 active.
I thought I had 2 GPU projects, but it seems just one at the moment.
So its 3-4 CPU projects.


I recently (6 weeks ago) added a 5th project (6 if you include Ralph which very rarely has work) because 3 of the projects were out of work / broken at the same time.

One of my crunchers is now back to running smoothly whilst the other still has the occasional lump or bump as one project or another grabs a bit extra but is almost there.



I had more than a lump and a bump before I tried dividing up the computer.
Like now, WCG is really really down close to dead and now that I opened things back up it still is down, but the results I checked are pending. So there is hope.
ID: 102855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 400
Credit: 12,294,748
RAC: 6,222
Message 102858 - Posted: 1 Oct 2021, 3:26:17 UTC - in response to Message 102855.  

Bryn Mawr - added the half_life, will sit back and see what happens.
Current WCG is dying in credits, guess I will have to pump that one up higher in %
Or just let things be until they have a chance to settle down- with 8 active projects, even with the changed half life value, i'd expect you're looking at a couple of weeks. One week bare minimum.
Then adjust Resource share as necessary.



Ok..will do.
It's 5 active.
I thought I had 2 GPU projects, but it seems just one at the moment.
So its 3-4 CPU projects.


I recently (6 weeks ago) added a 5th project (6 if you include Ralph which very rarely has work) because 3 of the projects were out of work / broken at the same time.

One of my crunchers is now back to running smoothly whilst the other still has the occasional lump or bump as one project or another grabs a bit extra but is almost there.



I had more than a lump and a bump before I tried dividing up the computer.
Like now, WCG is really really down close to dead and now that I opened things back up it still is down, but the results I checked are pending. So there is hope.


That’s the project, not your machine. I’ve just had two days of low WCG credits and the shortfall turned up this morning - c’est la vie.
ID: 102858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102865 - Posted: 2 Oct 2021, 7:42:12 UTC - in response to Message 102858.  

Bryn Mawr - added the half_life, will sit back and see what happens.
Current WCG is dying in credits, guess I will have to pump that one up higher in %
Or just let things be until they have a chance to settle down- with 8 active projects, even with the changed half life value, i'd expect you're looking at a couple of weeks. One week bare minimum.
Then adjust Resource share as necessary.



Ok..will do.
It's 5 active.
I thought I had 2 GPU projects, but it seems just one at the moment.
So its 3-4 CPU projects.


I recently (6 weeks ago) added a 5th project (6 if you include Ralph which very rarely has work) because 3 of the projects were out of work / broken at the same time.

One of my crunchers is now back to running smoothly whilst the other still has the occasional lump or bump as one project or another grabs a bit extra but is almost there.



I had more than a lump and a bump before I tried dividing up the computer.
Like now, WCG is really really down close to dead and now that I opened things back up it still is down, but the results I checked are pending. So there is hope.


That’s the project, not your machine. I’ve just had two days of low WCG credits and the shortfall turned up this morning - c’est la vie.


I gave it 200% and now its climbing like a jet plane. Just have to get LHC back up after WCG and then I think everything can go back to 100%.
ID: 102865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 102871 - Posted: 2 Oct 2021, 9:50:42 UTC - in response to Message 102865.  
Last modified: 2 Oct 2021, 9:57:04 UTC

I gave it 200% and now its climbing like a jet plane. Just have to get LHC back up after WCG and then I think everything can go back to 100%.
And then it will drop again.
So you'll change it, and it will rise again. So you'll change it and it will fall again. So you'll change it, and it will rise again. So you'll change it and it will fall again. etc, etc.
Most (if not all) of that rapid increase is not a result of your changes but for the reason Bryn posted- the Project had a delay in granting Credit, now it's all coming through. Hence the surge in Credit.


RAC rises slowly, and falls quickly.
The half_life change Bryn suggested should allow things to settle down sooner rather than later, but with the number of projects you have we're still talking weeks- not days. And as you change things, then change them back again, then change them, then change them again, it just keeps extending the time it will take for things to settle to actually meet whatever Resource share you finally leave things at for an extended period (ie over a few weeks).
Grant
Darwin NT
ID: 102871 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 102872 - Posted: 2 Oct 2021, 16:55:04 UTC - in response to Message 102871.  

I gave it 200% and now its climbing like a jet plane. Just have to get LHC back up after WCG and then I think everything can go back to 100%.
And then it will drop again.
So you'll change it, and it will rise again. So you'll change it and it will fall again. So you'll change it, and it will rise again. So you'll change it and it will fall again. etc, etc.
Most (if not all) of that rapid increase is not a result of your changes but for the reason Bryn posted- the Project had a delay in granting Credit, now it's all coming through. Hence the surge in Credit.


RAC rises slowly, and falls quickly.
The half_life change Bryn suggested should allow things to settle down sooner rather than later, but with the number of projects you have we're still talking weeks- not days. And as you change things, then change them back again, then change them, then change them again, it just keeps extending the time it will take for things to settle to actually meet whatever Resource share you finally leave things at for an extended period (ie over a few weeks).



Yeah I know it drops. So Just ramming it through to get up and later when I go back to work drop it.
Half life was changed last week.
ID: 102872 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,538,222
RAC: 10,691
Message 103032 - Posted: 27 Oct 2021, 20:13:19 UTC

Project was down a little earlier, apparently to do a quick filesystem switch, but it got delayed and they didn't start it back up, so people would've seen

Server error: feeder not running
Project requested delay of 3600 seconds

Quickly fixed after a nudge. Looks fine now.

You didn't imagine it
ID: 103032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103039 - Posted: 28 Oct 2021, 9:32:01 UTC

Quite a backlog of Validations now.
Given that there is no longer any work for minirosetta, they could probably shut down all of the minirosetta processes, and make use of the freed up resources for a few more Rosetta Assimilators and Validators.

From the Server Status page-
rah_assimilator_rosetta1 (rosetta)
rah_assimilator_rosetta2 (rosetta)
rah_assimilator_rosetta3 (rosetta)
rah_assimilator_rosetta4 (rosetta)
rah_assimilator_rosetta5 (rosetta)
rah_assimilator_mini1 (minirosetta)
rah_assimilator_mini2 (minirosetta)
rah_assimilator_mini3 (minirosetta)
rah_assimilator_mini4 (minirosetta)
rah_assimilator_mini5 (minirosetta)
rah_validator_rosetta1 (rosetta)
rah_validator_rosetta2 (rosetta)
rah_validator_mini1 (minirosetta)
rah_validator_mini2 (minirosetta)

Grant
Darwin NT
ID: 103039 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103044 - Posted: 28 Oct 2021, 20:36:08 UTC

Validation backlog appears to be growing- now over 104,000
The Server Status for the Validators might be showing green, but they don't appear to be actually doing anything at present.
Grant
Darwin NT
ID: 103044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103045 - Posted: 28 Oct 2021, 22:15:38 UTC - in response to Message 103044.  

Validation backlog appears to be growing- now over 104,000
The Server Status for the Validators might be showing green, but they don't appear to be actually doing anything at present.
Now over 114,000.
Yep- it's broken.
Grant
Darwin NT
ID: 103045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 103046 - Posted: 29 Oct 2021, 0:57:57 UTC
Last modified: 29 Oct 2021, 1:06:30 UTC

A task running MUCH longer than the expected 8 hours:

aaab_nNMALA_pp-SAR_pp-mPPS-BGLY_pp_2_2245795_6_1

https://boinc.bakerlab.org/rosetta/result.php?resultid=1441862159

2 days, 8 hours, 32 minutes so far

rosetta python 1.03 vbox64

This is elapsed time, not the much shorter CPU time.
ID: 103046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103047 - Posted: 29 Oct 2021, 3:41:39 UTC - in response to Message 103045.  

Validation backlog appears to be growing- now over 104,000
The Server Status for the Validators might be showing green, but they don't appear to be actually doing anything at present.
Now over 114,000.
Yep- it's broken.
Now over 138k.
Grant
Darwin NT
ID: 103047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 400
Credit: 12,294,748
RAC: 6,222
Message 103048 - Posted: 29 Oct 2021, 9:14:09 UTC - in response to Message 103047.  

Validation backlog appears to be growing- now over 104,000
The Server Status for the Validators might be showing green, but they don't appear to be actually doing anything at present.
Now over 114,000.
Yep- it's broken.
Now over 138k.


And now over 176k but some must be getting through.

Yesterday I dropped to 3k credits for the day as everything was pending but today I have 11k :-)
ID: 103048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,538,222
RAC: 10,691
Message 103051 - Posted: 29 Oct 2021, 22:45:18 UTC - in response to Message 103048.  
Last modified: 29 Oct 2021, 22:50:44 UTC

Validation backlog appears to be growing- now over 104,000
The Server Status for the Validators might be showing green, but they don't appear to be actually doing anything at present.
Now over 114,000.
Yep- it's broken.
Now over 138k.


And now over 176k but some must be getting through.

Yesterday I dropped to 3k credits for the day as everything was pending but today I have 11k :-)

Now up to 237k backlog, but I don't have any pending dated 28th Oct so some are going through, just nowhere near enough to keep up, let alone catch up.
I sent a message about 11hrs ago and got a reply about 8hrs ago that it'd be looked at when they got in, which I'm guessing would be ~6hrs ago.
That it's not fully fixed yet indicates it's not as straightforward as the feeder issue a few days before. I've heard nothing more since.

It's been reported and acknowledged. That's all I can say.

PS: Apart from being away from home from yesterday until Sunday week apart from 1.5days, my email provider has had a major outage which looks like it'll take 2-3 days to fix, making matters worse.
I will be able to check in here for 6 of 9 days I'm away and I am using a backup email account if anything new comes up - hopefully I won't have to
When it rains it pours...

Edit: When I started typing my credits were 300 less than what were showing here, so I did a manual update and my credits were 400 more than are showing here. Lots from 29th October updated, but in quite a funny order. Maybe things are moving much more rapidly right now? Fingers crossed
ID: 103051 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103052 - Posted: 30 Oct 2021, 0:15:52 UTC

Just checked my Tasks and a few from the 29th have come through, but the number of Pendings is still almost triple the number of Valids.
Hopefully the life signs will continue to improve as the day goes on.
Grant
Darwin NT
ID: 103052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,538,222
RAC: 10,691
Message 103053 - Posted: 30 Oct 2021, 0:36:04 UTC - in response to Message 103052.  

Just checked my Tasks and a few from the 29th have come through, but the number of Pendings is still almost triple the number of Valids.
Hopefully the life signs will continue to improve as the day goes on.

Yeah, another look and I'm not buying my idea either tbh. Updated to 243k backlog - higher still, not lower.
A watched pot never boils - I'll look again tomorrow
ID: 103053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,408,362
RAC: 20,061
Message 103055 - Posted: 30 Oct 2021, 9:15:19 UTC

Luckily the Rosetta graphs also show the Validation numbers.
It looks like the Validators have been having issues for a while now. Generally they've been averaging a backlog of around 600 or so. But since Wednesday of last week, there have been periods where they've been falling behind, then catching up. The amount they fall behind each time getting larger until they came good for a couple of days from late Sunday.
Then they stared falling behind again, more and more each time until the present huge backlog.



Compare that to over the last year.


Grant
Darwin NT
ID: 103055 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 103057 - Posted: 30 Oct 2021, 12:38:13 UTC - in response to Message 103046.  

A task running MUCH longer than the expected 8 hours:

aaab_nNMALA_pp-SAR_pp-mPPS-BGLY_pp_2_2245795_6_1

https://boinc.bakerlab.org/rosetta/result.php?resultid=1441862159

2 days, 8 hours, 32 minutes so far

rosetta python 1.03 vbox64

This is elapsed time, not the much shorter CPU time.

Now aborted after 3 days and 20 hours elapsed, less than 10 minutes CPU time.

The python tasks need a major improvement in how they detect tasks taking too long to run,

Could the current validator be written in Python, and having this same problem?
ID: 103057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 103064 - Posted: 30 Oct 2021, 23:00:13 UTC

Rosetta@Home has a problem with how you recover after losing your password.

The line where it asks you to enter your email address will not allow you to enter anything unless toy first click in the right half of the line and make the box appear that you need to put the email address inside.
ID: 103064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,538,222
RAC: 10,691
Message 103065 - Posted: 31 Oct 2021, 3:41:56 UTC - in response to Message 103053.  

Just checked my Tasks and a few from the 29th have come through, but the number of Pendings is still almost triple the number of Valids.
Hopefully the life signs will continue to improve as the day goes on.

Yeah, another look and I'm not buying my idea either tbh. Updated to 243k backlog - higher still, not lower.
A watched pot never boils - I'll look again tomorrow

Not getting any better - in fact much worse.
I've sent another nudge with a request for a timescale.

Combined with my entire email provider being down for 3 consecutive days, this is not what I want to see...
ID: 103065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TSD

Send message
Joined: 10 Oct 08
Posts: 7
Credit: 2,189,714
RAC: 0
Message 103067 - Posted: 31 Oct 2021, 17:14:53 UTC

As usual there is no information about what is happening. I don't know what I am doing here.

I'm running Folding@Home now.
ID: 103067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 125 · 126 · 127 · 128 · 129 · 130 · 131 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org