Credit granted pending stack

Message boards : Number crunching : Credit granted pending stack

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80482 - Posted: 4 Aug 2016, 16:33:23 UTC

DK, from what people have reported over time, it sounds like the server side must be hitting a section of special code for the case when the project is "out of work" (i.e. frantically creating work, but still unable to keep up with immediate demand). And in that case, it seems the server sends back to client a 24hr backoff. I'm thinking if you can search your server source code for "86400" (seconds in a day), you may find it.

I would think this was server self-preservation early on in R@h project. But now that new work is available seconds or minutes later, it has outlasted it's initial purpose.
Rosetta Moderator: Mod.Sense
ID: 80482 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80483 - Posted: 4 Aug 2016, 19:26:00 UTC - in response to Message 80482.  

DK, from what people have reported over time, it sounds like the server side must be hitting a section of special code for the case when the project is "out of work" (i.e. frantically creating work, but still unable to keep up with immediate demand). And in that case, it seems the server sends back to client a 24hr backoff. I'm thinking if you can search your server source code for "86400" (seconds in a day), you may find it.

I would think this was server self-preservation early on in R@h project. But now that new work is available seconds or minutes later, it has outlasted it's initial purpose.



The 24 hour backoff seems fine to me to help reduce the load on our servers in those special situations which hopefully should not happen that often. I can look into the scheduler code though. I think we are low on jobs at the moment but the demand fluctuates depending on what is necessary for the specific research projects.
ID: 80483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 80487 - Posted: 5 Aug 2016, 2:12:39 UTC

Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff.

So it has actually gotten in the way of completing 48hr deadline work for CASP.
Rosetta Moderator: Mod.Sense
ID: 80487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 80492 - Posted: 5 Aug 2016, 18:08:00 UTC - in response to Message 80487.  

Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff.

So it has actually gotten in the way of completing 48hr deadline work for CASP.



That's not good. The tight deadline is tough particularly when the system is having issues. Hopefully we can prevent issues from arising in the future but there's a lot of moving parts.
ID: 80492 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 80493 - Posted: 6 Aug 2016, 4:23:20 UTC - in response to Message 80487.  

Right, it would be ok if all that it effects is getting new work to the host, but it also affects reporting completed work, and I think it may even halt uploads once it hits that threshold. It only seems to take 2 or 3 "no work" indications in 10 minutes to receive the 24hr backoff.

So it has actually gotten in the way of completing 48hr deadline work for CASP.

That's not quite right. It goes to 24hr back-off on the very first occasion, not after 2 or 3. Observed many times. And this affects polling of any type, so uploads too in a period where CASP tasks need to come back before a 2 day deadline (inc processing time, which I accidentally discovered today has increased to a default 8 hours rather than 6hrs).

Unless users are micro-managing to ensure everything's going ok, this won't be spotted. I suspect 'normal' people just let things run so won't know it's happened.

As I mentioned earlier, I haven't noticed any 24hr backoffs for some weeks, so the urgency has gone out of this issue, but the principle stands.
ID: 80493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
No.15

Send message
Joined: 30 Dec 15
Posts: 7
Credit: 7,621,315
RAC: 0
Message 80494 - Posted: 7 Aug 2016, 0:50:28 UTC

Right it is not 2 or 3 tries it is 1. If there is one issue you get 24 hour back off and all your current work does not get uploaded. You either have to consistently watch the project, be willing to lose credit or move on. It might even be ok if Rosetta's work was 2 or 3 days out but it seems their deadlines are always tight. Like I said I want to crunch Rosetta but I am not willing to baby sit it. I moved my stuff over to WCG but I will be back if this ever gets fixed.
ID: 80494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 80525 - Posted: 10 Aug 2016, 2:16:59 UTC

As a lot of tasks have probably been returned recently, validation is pending for something like 3+ hours.

I also spotted a 24hr backoff for the first time in weeks - just once, but it hasn't gone away
ID: 80525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Credit granted pending stack



©2024 University of Washington
https://www.bakerlab.org