Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 55 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 71703 - Posted: 2 Dec 2011, 2:16:22 UTC - in response to Message 71687.  

But, as you say, there appears to be nothing about it which I can actively solve, so I must just 'put up with it', unless I become a programmer on the Rosetta team, and that ain't about to happen anytime soon!

You'd have to become a BOINC programmer ;) - BOINC just runs the rosetta app when it deems fit (as far as I'm aware anyway!)...

The manual work-around is to hit 'no new tasks' for the project and then suspend all but the tasks you want to run. Then re-enable new tasks once they've completed. I wouldn't recommend it though - you'll still get credited for almost all work submitted after its deadline up to a point with Rosetta. And it's very easy to forget to re-enable new work!

Yup, that's what I'd do just to tidy up the waiting tasks. If there's only a few %age points of runtime left it ought to go through quite quickly.

Boinc (not Rosetta) has always had this kind of scheduling problem. It's actually got better with the latest public release, but it looks like there are still issues. Very annoying if you run more than one project.
ID: 71703 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 71704 - Posted: 2 Dec 2011, 2:27:01 UTC

The tasks that came down a few days ago (Monday) after the outage all seem to have run very short. 1-3 hours instead of my preferred 8 hour setting. I guess that's why we ran out again so quickly.

Also, the validator seems to be way behind as well as giving out some errors.

See here: My results
Note: the runtime is requested to be 28800 but 3-14000 is a typical outcome

Also, it's worth the Rosetta guys looking to be well-stocked up with tasks ahead of the Christmas period and rebooting the server in the last week to ensure the holiday period goes as smoothly as possible. EVERY year there's a problem, so as much that can be done to pre-empt any issues would be appreciated & cut down on the regular whining that comes as a result.
ID: 71704 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71732 - Posted: 3 Dec 2011, 15:22:23 UTC - in response to Message 71704.  

The tasks that came down a few days ago (Monday) after the outage all seem to have run very short. 1-3 hours instead of my preferred 8 hour setting. I guess that's why we ran out again so quickly.

Also, the validator seems to be way behind as well as giving out some errors.

See here: My results
Note: the runtime is requested to be 28800 but 3-14000 is a typical outcome

Also, it's worth the Rosetta guys looking to be well-stocked up with tasks ahead of the Christmas period and rebooting the server in the last week to ensure the holiday period goes as smoothly as possible. EVERY year there's a problem, so as much that can be done to pre-empt any issues would be appreciated & cut down on the regular whining that comes as a result.



My response to your Christmas idea is "HAH" won't happen.
Never has, never will.
But we can always hope.
ID: 71732 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71733 - Posted: 3 Dec 2011, 15:28:16 UTC

After a review of my tasks I found one that came up with: Maximum elapsed time exceeded. However there was 0 cpu time and no debuger report.

The name of the task was: ab_11_19__opt_T6041_opt_cst_pred_wt_14_03_09_35570_28_0
result page: https://boinc.bakerlab.org/rosetta/result.php?resultid=465870741

Funny thing is, my wingman completed the tasks with no trouble on a 64bit vista machine. He has a slightly newer quad core than me.
ID: 71733 · Rating: 0 · rate: Rate + / Rate - Report as offensive
brilor

Send message
Joined: 31 Mar 08
Posts: 9
Credit: 124,013
RAC: 0
Message 71735 - Posted: 4 Dec 2011, 0:10:03 UTC

My results show 7 completed tasks with "granted credit" equal to "pending" since 29-Nov-2011. All programs are running in server status but noticed both a "validator_mini" and a "validator_beta". Presumably this is a technical issue with the server(s) but any hints to the contrary would be welcome.
ID: 71735 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,518,559
RAC: 10,612
Message 71741 - Posted: 4 Dec 2011, 6:49:02 UTC - in response to Message 71732.  
Last modified: 4 Dec 2011, 6:50:39 UTC

My response to your Christmas idea is "HAH" won't happen.

Thank you for your contribution.

The short runtimes seem to have been solved with the jobs that were sent from 2 Dec onwards. Thanks.
ID: 71741 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Alan J Rodger

Send message
Joined: 16 Oct 05
Posts: 7
Credit: 32,282
RAC: 0
Message 71743 - Posted: 4 Dec 2011, 15:38:27 UTC

Two more work units continuing to run with elapsed time and time to go increasing simultaneously. This has been going on too long - it doesn't seem as if Rosetta can manage their system - I quit! there are many more systems that run without problems!

ID: 71743 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,860,059
RAC: 7,494
Message 71744 - Posted: 4 Dec 2011, 19:44:21 UTC - in response to Message 71743.  

Two more work units continuing to run with elapsed time and time to go increasing simultaneously. This has been going on too long - it doesn't seem as if Rosetta can manage their system - I quit! there are many more systems that run without problems!

Alan

That isn't strictly a bug in Rosetta and always happens with this project because BOINC struggles to calculate how long is left due to the way Rosetta packs multiple models into a task to meet the run-time preference of the user. Time to complete increases steadily until a checkpoint and is then recalculated at which point it drops before beginning to climb again.

Danny
ID: 71744 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bob Stryk

Send message
Joined: 16 Jan 06
Posts: 1
Credit: 484,245
RAC: 0
Message 71746 - Posted: 4 Dec 2011, 22:44:40 UTC

Since re-installing Windows XP most rosetta@home work units end with a Computation error. The notation in the Events Log is of the form "Output file ... for task ... absent". Is this the work unit or something not quite right with the computer?
ID: 71746 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Leland Kornhaus

Send message
Joined: 16 Jul 06
Posts: 2
Credit: 1,793,841
RAC: 0
Message 71747 - Posted: 4 Dec 2011, 23:16:51 UTC - in response to Message 71743.  

Two more work units continuing to run with elapsed time and time to go increasing simultaneously. This has been going on too long - it doesn't seem as if Rosetta can manage their system - I quit! there are many more systems that run without problems!


I've got the same issue. First, I had a lot of 3 hour tasks that would take 7+ hours of processor time: Here's a nice example that was granted less than 1/6th the claimed credit:
https://boinc.bakerlab.org/rosetta/result.php?resultid=465189829
... but at least it was granted credit!

Since December 1st I believe I have 179 work units with about 20,600 in claimed credits that are still pending. If the work units aren’t even acknowledged, are they being used? I’d hate this much electricity & processing to be wasted on random work that isn’t valued.
ID: 71747 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jesse Viviano

Send message
Joined: 14 Jan 10
Posts: 42
Credit: 2,700,472
RAC: 0
Message 71748 - Posted: 5 Dec 2011, 3:34:28 UTC - in response to Message 71747.  

Two more work units continuing to run with elapsed time and time to go increasing simultaneously. This has been going on too long - it doesn't seem as if Rosetta can manage their system - I quit! there are many more systems that run without problems!


I've got the same issue. First, I had a lot of 3 hour tasks that would take 7+ hours of processor time: Here's a nice example that was granted less than 1/6th the claimed credit:
https://boinc.bakerlab.org/rosetta/result.php?resultid=465189829
... but at least it was granted credit!

Since December 1st I believe I have 179 work units with about 20,600 in claimed credits that are still pending. If the work units aren’t even acknowledged, are they being used? I’d hate this much electricity & processing to be wasted on random work that isn’t valued.


Normally the servers are able to keep up with the validation. However, there was a crash this week so there is a big post-crash backlog to handle. I do not know if the crash was a server crash or a networking hardware failure, but the results are the same either way. This project apparently has very little margin left and probably needs to upgrade its servers.
ID: 71748 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Leland Kornhaus

Send message
Joined: 16 Jul 06
Posts: 2
Credit: 1,793,841
RAC: 0
Message 71749 - Posted: 5 Dec 2011, 3:56:09 UTC - in response to Message 71748.  

Thank you for the explanation.


> Normally the servers are able to keep up with the validation. However, there
> was a crash this week so there is a big post-crash backlog to handle. I do
> not know if the crash was a server crash or a networking hardware failure,
> but the results are the same either way. This project apparently has very
> little margin left and probably needs to upgrade its servers.[/quote]

ID: 71749 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Michael Kingsford Gray
Avatar

Send message
Joined: 28 Nov 11
Posts: 3
Credit: 4,593,564
RAC: 0
Message 71750 - Posted: 5 Dec 2011, 7:08:36 UTC

Thanks for all of the very helpful suggestions.

I may have an explanation for at least a part of the recent drought of jobs:
My new multiprocessor workstations that I am progressively enabling might be gobbling them up. (More to come, as well!)

Rosetta appears to have partially resolved my scheduling issue, at least to the extent where one of the 99%+ jobs has decided to complete at last!
But its deadline was in another 4 days anyway, (as are all of the stalled jobs), so perhaps I am worrying about nothing?

Certainly "GPUGrid" is far better behaved than Rosetta.
Things must only get better. I assume that the Rosetta programmers are either volunteers, or academics of some sort?
It may well be that I am demanding commercial performance from those who are effectively unpaid amateurs?
Philosophy is Bunk - Richard P. Feynman
ID: 71750 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 71755 - Posted: 6 Dec 2011, 3:05:50 UTC

Hi.

Any chance of someone having a look at the validator, there are tasks that have been sitting waiting over a week now.

thanks.

ID: 71755 · Rating: 0 · rate: Rate + / Rate - Report as offensive
luisr

Send message
Joined: 11 Sep 06
Posts: 3
Credit: 1,039,929
RAC: 0
Message 71756 - Posted: 6 Dec 2011, 4:23:32 UTC

Same here.
There are ~4200 credits claimed and waiting validation from 2 december until today... 3 complete days of computing...
ID: 71756 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [PUGLIA] kidkidkid3
Avatar

Send message
Joined: 14 Sep 10
Posts: 11
Credit: 2,348,063
RAC: 0
Message 71757 - Posted: 6 Dec 2011, 6:24:57 UTC

Same problem for all of us ... from the first of december
the validator didn't work correctly ... we are waiting for
an administrator's help ... please tell us what's appening.
Thanks in advance.
I'm a old italian programmer (do you know cards ?). Now, i recycle/repair old pc of my friends, and they revive for research.
A long trip begin with a little step ...
ID: 71757 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Terianne929

Send message
Joined: 5 Oct 11
Posts: 1
Credit: 11,531
RAC: 0
Message 71758 - Posted: 6 Dec 2011, 12:34:59 UTC

I have 3 jobs that are not validating, all three names have similar beginning:
Name: T0569... Task ID: 466779186, WU ID: 425851216, completion time given as 03:02:48 but only ran 00:50:33.
Name: T0540... Task ID: 467186228, WU ID: 426233861
Name: T0541... Task ID: 467186269, WU ID: 426233928

Also, have had several jobs that ADDED time to the completion time (as much as 15 minutes) while still elapsing time
ID: 71758 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,860,059
RAC: 7,494
Message 71760 - Posted: 6 Dec 2011, 22:17:16 UTC - in response to Message 71758.  

Also, have had several jobs that ADDED time to the completion time (as much as 15 minutes) while still elapsing time

Hi Terianne, that's normal for Rosetta. BOINC can't calculate time remaining very accurately so it tends to increase slowly and then drop more significantly and then repeat.
ID: 71760 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71763 - Posted: 7 Dec 2011, 18:14:30 UTC - in response to Message 71757.  

Nobody really reads these posts anymore from the team.
I sent a message to someone I had a conversation with once before to relay this issue to someone on the team.


Same problem for all of us ... from the first of december
the validator didn't work correctly ... we are waiting for
an administrator's help ... please tell us what's appening.
Thanks in advance.

ID: 71763 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Cutchet Salvador

Send message
Joined: 1 Feb 10
Posts: 17
Credit: 10,690,439
RAC: 0
Message 71764 - Posted: 7 Dec 2011, 20:03:08 UTC - in response to Message 71763.  
Last modified: 7 Dec 2011, 20:04:05 UTC

Greg, I am sorry to have to give him the account, but it is true.
I sent a personal mail to the Manager and neither he has answered me.
Really the team of R@H goes on from the disinterested collaborators. Only news of Mr. Baker i nothing more.
It is painful and sad.
Meanwhile the people stop working with R@H and so calm they.
Between the credits validated to 0 and the earrings, we are doing the idiot.
Greetings and patience.
ID: 71764 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org