Not polling for 22+ hours?

Message boards : Number crunching : Not polling for 22+ hours?

To post messages, you must log in.

AuthorMessage
Profile Sid

Send message
Joined: 12 Jun 07
Posts: 9
Credit: 3,563,238
RAC: 160
Message 80187 - Posted: 16 Jun 2016, 12:44:17 UTC


. . . caused one of my i7's to run dry for about 6 hours.

Anyone else have this issue?
ID: 80187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,644,940
RAC: 233
Message 80188 - Posted: 16 Jun 2016, 13:18:15 UTC

This happens from time to time when the servers get really loaded (usually due to huge spikes of new hosts coming online all at once). This has been the case from time to time lately as CE brings more and more hosts over to R@H.

It's sort of a good thing, though a bit of an annoyance. Hopefully a server upgrade will eventually fix this.
ID: 80188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sid

Send message
Joined: 12 Jun 07
Posts: 9
Credit: 3,563,238
RAC: 160
Message 80189 - Posted: 16 Jun 2016, 14:31:54 UTC - in response to Message 80188.  

This happens from time to time when the servers get really loaded (usually due to huge spikes of new hosts coming online all at once).




Thanks for the response, I'm glad that the Pentathlon hasn't smoked Rosetta's servers. . . .

ID: 80189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Repaxan

Send message
Joined: 28 Jun 08
Posts: 1
Credit: 532,897
RAC: 0
Message 80190 - Posted: 16 Jun 2016, 17:32:24 UTC

Scheduler updates been failing for the past ~48 hours for me.
ID: 80190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1866
Credit: 8,186,159
RAC: 6,319
Message 80193 - Posted: 17 Jun 2016, 12:58:39 UTC - in response to Message 80188.  

Hopefully a server upgrade will eventually fix this.


While the grass grows.....
ID: 80193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 80205 - Posted: 20 Jun 2016, 15:31:19 UTC

Increase the minimum reserve. That's what I did.
ID: 80205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,644,940
RAC: 233
Message 80211 - Posted: 21 Jun 2016, 13:24:21 UTC - in response to Message 80205.  

Increase the minimum reserve. That's what I did.


Same, I now run with a minimum of 0.7 days and a max of 0.8 days and this seems to be enough to keep the beasts fed while still keeping my average turn around time nice and low so as to support CASP efforts that require quick turn around to meet the deadlines and still give the scientists time to do their analysis on the results.


**38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research
ID: 80211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,584,443
RAC: 17,403
Message 80213 - Posted: 21 Jun 2016, 18:18:55 UTC - in response to Message 80211.  

Increase the minimum reserve. That's what I did.

Same, I now run with a minimum of 0.7 days and a max of 0.8 days and this seems to be enough to keep the beasts fed while still keeping my average turn around time nice and low so as to support CASP efforts that require quick turn around to meet the deadlines and still give the scientists time to do their analysis on the results.

Great if it works for you. With a potential unexpected 24hr delay I'm working on the basis that a buffer of 1 day plus run time will cover all eventualities while keeping within deadlines. I use 1.5 days, down from my previous 2.0 days, while using an 8hr runtime, to account for variations.
ID: 80213 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,584,443
RAC: 17,403
Message 80216 - Posted: 22 Jun 2016, 2:45:11 UTC

Ok, think I've worked out what's happening with something I've seen a few times now. This one's typical:

rb_06_17_66429_110474__t000__ab_robetta_IGNORE_THE_REST_381779_795_1

Task returned well within deadline, but validation reports "Task was reported too late to validate"

Checking the workunit details I see it's been issued before, but missed its deadline so got re-issued out to me. A few hours later, the original task gets returned back after deadline but before I get mine back. It's credited to them, quite rightly, then by the time I get mine back the task has already been shutdown.

If that user suffered one of those 24hr delays after a failure to pick up tasks on polling, they miss their deadline, the task gets reissued, then even if that reissue is back within its own deadline it will fail in the way described above.

Not yet sure if the overnight jobs find these instances and retrospectively issues credit or if it's lost altogether. In any case, it doesn't seem to do anyone any good - user or project.

Is there some way to escalate a resolution as there seem to be problematic knock-on effects all round.
ID: 80216 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 80225 - Posted: 23 Jun 2016, 4:49:41 UTC - in response to Message 80187.  


. . . caused one of my i7's to run dry for about 6 hours.

Anyone else have this issue?


Not certain it's the same problem, but Rosetta@home is definitely broken again, and not just the Mac client this week. According to the service status page, everything is working fine, just fine and dandy.

The same spokesman will probably pop up again and say they really do care, but their actions and inactions speak much more loudly than any words. They don't know what's going on, they don't know when it's working and when it isn't, they can't even provide accurate status information, but perhaps most importantly, there's very little evidence (that I can detect) that the problems are being tackled in anything resembling a systematic way.

I've already noted that such carelessness is quite likely to shadow any results they publish. Even that didn't seem to motivate them towards improvements.

Good thing they've cured me from caring too much.

Bad thing that I can't stop myself from wasting the keystrokes with another suggestion. Maybe the Rosetta@home people need to figure out how many clients they can actually handle and route extra clients to other projects.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 80225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Not polling for 22+ hours?



©2024 University of Washington
https://www.bakerlab.org