Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 55 · Next

AuthorMessage
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70926 - Posted: 5 Aug 2011, 21:19:39 UTC - in response to Message 70925.  


Did you set the "While processor usage is less than X %" to 0%?


Yes it is set at zero.


And I shifted allocation. Seti and Rosetta are now 50 50 on a 2 core system so essentially they each have one core.

I figured fighting disease is at least as important as finding ET.
ID: 70926 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Joanee
Avatar

Send message
Joined: 5 Jun 11
Posts: 1
Credit: 0
RAC: 0
Message 70962 - Posted: 7 Aug 2011, 23:36:51 UTC - in response to Message 70834.  

Oh yeah, once again someone said awhile back that they would be monitoring the boards for discussions like this one. Once again the system fails and no one see's or says anything about it.


It's not really an issue about the system failing - it simply that we don't (currently) have jobs that are ready to run right this minute. The running of the jobs on R@h is only one step in the process - it takes a while to figure out what sorts of jobs will give usable scientific results, to set up the jobs, test them to make sure they won't cause a huge failure rate, and then at the end of the runs to process the results to figure out what the next round should do. Usually we have enough things going on that the computational lull in one project will be covered by the compute phase of a different one. We just happen to have hit a point where none of the currently active projects is in an active compute phase. (And doesn't help that we're maximally distant from both the previous and next CASP - as you've probably noticed, activity seems to ramp up before [mad rush to finalize improvements], during, and after [post-analyis] CASP.)

We're aware that the queue is empty - a message has been sent out on the appropriate internal mailing list. While we want to provide you with work units, we don't want to waste your time with scientifically pointless make-work. - It's somewhat trivial to re-run old jobs, but is that worth doing if no one is going to look at the results?

I hesitate to say this, as I don't want it to sound like we're chasing you away(*), but I'd agree with the implicit recommendation stated above to crunch other projects while we have this momentary lull. You can increase your stats on other projects secure in the knowledge that no one will gain on you with Rosetta@home. With any luck, we'll have new jobs for you early next week. (e.g. "We apologize for the inconvenience - Regular service should resume shortly.")

*) We really do appreciate your efforts. Having access to the computational resources of R@h allows us to do things we couldn't do otherwise. Frankly speaking, I was surprised how quickly and easily R@h handled my recent jobs. I would have monopolized our local computational resources, but R@h crunched through it like it was nothing. - It's prompted me to think about possible process improvement experiments that I probably wouldn't have otherwise considered due to the computational cost. (Unfortunately, it's in the very preliminary stages and nowhere near the point where I could actually launch any jobs.)

ID: 70962 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,142,074
RAC: 2,093
Message 70965 - Posted: 8 Aug 2011, 1:09:01 UTC - in response to Message 70962.  

And what you're saying is?

Ralf
ID: 70965 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70969 - Posted: 8 Aug 2011, 5:35:51 UTC - in response to Message 70962.  

Oh yeah, once again someone said awhile back that they would be monitoring the boards for discussions like this one. Once again the system fails and no one see's or says anything about it.


It's not really an issue about the system failing - it simply that we don't (currently) have jobs that are ready to run right this minute. The running of the jobs on R@h is only one step in the process - it takes a while to figure out what sorts of jobs will give usable scientific results, to set up the jobs, test them to make sure they won't cause a huge failure rate, and then at the end of the runs to process the results to figure out what the next round should do. Usually we have enough things going on that the computational lull in one project will be covered by the compute phase of a different one. We just happen to have hit a point where none of the currently active projects is in an active compute phase. (And doesn't help that we're maximally distant from both the previous and next CASP - as you've probably noticed, activity seems to ramp up before [mad rush to finalize improvements], during, and after [post-analyis] CASP.)

We're aware that the queue is empty - a message has been sent out on the appropriate internal mailing list. While we want to provide you with work units, we don't want to waste your time with scientifically pointless make-work. - It's somewhat trivial to re-run old jobs, but is that worth doing if no one is going to look at the results?

I hesitate to say this, as I don't want it to sound like we're chasing you away(*), but I'd agree with the implicit recommendation stated above to crunch other projects while we have this momentary lull. You can increase your stats on other projects secure in the knowledge that no one will gain on you with Rosetta@home. With any luck, we'll have new jobs for you early next week. (e.g. "We apologize for the inconvenience - Regular service should resume shortly.")

*) We really do appreciate your efforts. Having access to the computational resources of R@h allows us to do things we couldn't do otherwise. Frankly speaking, I was surprised how quickly and easily R@h handled my recent jobs. I would have monopolized our local computational resources, but R@h crunched through it like it was nothing. - It's prompted me to think about possible process improvement experiments that I probably wouldn't have otherwise considered due to the computational cost. (Unfortunately, it's in the very preliminary stages and nowhere near the point where I could actually launch any jobs.)




So why was this not posted out on the front page where everyone can read it instead of buried in deep in this topic?
ID: 70969 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,142,074
RAC: 2,093
Message 70971 - Posted: 8 Aug 2011, 6:05:44 UTC - in response to Message 70969.  

We're aware that the queue is empty - a message has been sent out on the appropriate internal mailing list.
So why was this not posted out on the front page where everyone can read it instead of buried in deep in this topic?
Looks like nobody who cares got the memo... :-(

Ralf
ID: 70971 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,780,807
RAC: 5,492
Message 70974 - Posted: 8 Aug 2011, 12:47:38 UTC - in response to Message 70969.  


So why was this not posted out on the front page where everyone can read it instead of buried in deep in this topic?


+1
ID: 70974 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 70977 - Posted: 8 Aug 2011, 17:04:18 UTC - in response to Message 70969.  

Oh yeah, once again someone said awhile back that they would be monitoring the boards for discussions like this one. Once again the system fails and no one see's or says anything about it.


It's not really an issue about the system failing - it simply that we don't (currently) have jobs that are ready to run right this minute. The running of the jobs on R@h is only one step in the process - it takes a while to figure out what sorts of jobs will give usable scientific results, to set up the jobs, test them to make sure they won't cause a huge failure rate, and then at the end of the runs to process the results to figure out what the next round should do. Usually we have enough things going on that the computational lull in one project will be covered by the compute phase of a different one. We just happen to have hit a point where none of the currently active projects is in an active compute phase. (And doesn't help that we're maximally distant from both the previous and next CASP - as you've probably noticed, activity seems to ramp up before [mad rush to finalize improvements], during, and after [post-analyis] CASP.)

We're aware that the queue is empty - a message has been sent out on the appropriate internal mailing list. While we want to provide you with work units, we don't want to waste your time with scientifically pointless make-work. - It's somewhat trivial to re-run old jobs, but is that worth doing if no one is going to look at the results?

I hesitate to say this, as I don't want it to sound like we're chasing you away(*), but I'd agree with the implicit recommendation stated above to crunch other projects while we have this momentary lull. You can increase your stats on other projects secure in the knowledge that no one will gain on you with Rosetta@home. With any luck, we'll have new jobs for you early next week. (e.g. "We apologize for the inconvenience - Regular service should resume shortly.")

*) We really do appreciate your efforts. Having access to the computational resources of R@h allows us to do things we couldn't do otherwise. Frankly speaking, I was surprised how quickly and easily R@h handled my recent jobs. I would have monopolized our local computational resources, but R@h crunched through it like it was nothing. - It's prompted me to think about possible process improvement experiments that I probably wouldn't have otherwise considered due to the computational cost. (Unfortunately, it's in the very preliminary stages and nowhere near the point where I could actually launch any jobs.)




So why was this not posted out on the front page where everyone can read it instead of buried in deep in this topic?


+2

I mean... seriously... All it takes is to edit the home page and add this... and EVERYONE would be like "God, this project is serious, it's analyzing our work... doesn't like to send useless WUs to keep us busy like SETI. I'm going to add a similar project (like POEM@Home!) to further help this field of science".

So little can do so much.
ID: 70977 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 70982 - Posted: 9 Aug 2011, 2:36:26 UTC - in response to Message 70919.  

When was then last time we had a post from the project about these problems? ...maybe everyone took a vacation at the same time

Nothing since we were told there'd be no work until early next week.

So, if everyone let the WU run longer, more work would be done with fewer WU and we would not run out of Rosetta tasks as often. This would make the best use of the admin's time, the server resources and probably give the scientists more bang for our efforts.

It would seem to be that people would want to set their run times longer than 3 hours making every WU really count.

Yes, that's exactly right. I usually have my preferences set to 8 hours, but once it became clear there'd be a delay in new WUs I maxed my remaining WUs to 24hrs. Some came to an end before that time, but most reached that kind of runtime.

I ran out for a couple of hours on one core out of four last week on my desktop so Boinc grabbed some WCG WUs just as Rosetta WUs reappeared, and the same happened tonight, just as another batch seems to be coming through again, so I've had (almost) no downtime & only a few hours (maybe a day total) of running my backup project.

On my laptop I had no downtime at all, but again a few WCG WUs came down to fill a buffer (maybe half a day's worth) both last week and tonight.

So I keep just a 2 day buffer, only very intermittent resupply over the last 12 days, but less than a day on each machine not running my preferred project and almost no downtime to speak of.

I only sneak a look at my status once a day (if that) and without any advice from the project team or mod it's pretty straightforward to guess what's happening with minimal intervention. It's not especially clever, so for the people this kind of thing matters to I'm surprised there's so much whinging to be honest. If people refuse to do even this small amount for themselves I don't see what can be complained about, especially as there's no permanent guarantee of continuous supply.

In case you missed my mentioning it, new WUs seem to be available now.
ID: 70982 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70987 - Posted: 9 Aug 2011, 9:05:34 UTC - in response to Message 70919.  
Last modified: 9 Aug 2011, 9:07:19 UTC


So, if everyone let the WU run longer, more work would be done with fewer WU and we would not run out of Rosetta tasks as often. This would make the best use of the admin's time, the server resources and probably give the scientists more bang for our efforts.

It would seem to be that people would want to set their run times longer than 3 hours making every WU really count.


Follow-up to my earlier comment. For anyone who may not have seen it, this post provides some interesting insights around WU Run time and the actual work that is done. But the net net seems to be that setting a longer run time has a positive impact on the project and the infrastructure.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4489&nowrap=true#67551
ID: 70987 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 70992 - Posted: 9 Aug 2011, 17:51:59 UTC

I'm letting my computer run for 12 more hrs and then decide if i should disconnect until some answers can be found
ID: 70992 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 70993 - Posted: 9 Aug 2011, 18:10:31 UTC - in response to Message 70992.  

I'm letting my computer run for 12 more hrs and then decide if i should disconnect until some answers can be found

Have you reported a problem somewhere I missed? WUs have been coming through all day - was there something else?
ID: 70993 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 70994 - Posted: 9 Aug 2011, 19:08:43 UTC

Sid - just did a quick scan through my logs for today and I'm not seeing any problems with the number of available work units or with their successful completion.

I also took a quick look at his task log and see that as recently as two days ago he was successfully processing tasks. Since then all I noticed was a bunch of "abort before start" (and not on the flex design WU)

I did not see a post describing a problem so I can't offer any suggestions.
ID: 70994 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 71005 - Posted: 10 Aug 2011, 18:54:19 UTC - in response to Message 70993.  

I'm letting my computer run for 12 more hrs and then decide if i should disconnect until some answers can be found

Have you reported a problem somewhere I missed? WUs have been coming through all day - was there something else?


the units are not completing any more
ID: 71005 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 71021 - Posted: 11 Aug 2011, 3:09:17 UTC

I have my preference set to 6 hours. I have run maybe 4 WU in the last 24 hours. They seems to be running to some kind of completion. Some are running more than 6 hours on the elapsed line.
ID: 71021 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 71024 - Posted: 11 Aug 2011, 4:41:16 UTC

Rochester - the tasks which are not completing - are they actually pulling cycles or have they stalled - what does your Performance Monitor (or whatever Windows calls it) currently show.

I also note you are running Windows 7 (unlike most of the Windows users here who have moldy copies of XP) - did you put on any maintenance this past weekend before the issue of non-completing tasks started?

And finally, was whoever it was who gave you the moniker "New York" upset at you and seeking to punish you for something?
ID: 71024 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 71025 - Posted: 11 Aug 2011, 4:45:48 UTC

Ed - thats about right - remember, the system will run past the target time if it is in the middle of a model (which for some unknown reason the call a decoy)

If it does run past the target time it will either terminate when the model it is currently working on completes, or until the "watch dog" wakes up and terminates it. This occurs when you reach a point four hours past the target time.


ID: 71025 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 71027 - Posted: 11 Aug 2011, 11:21:07 UTC - in response to Message 71025.  

Ed - thats about right - remember, the system will run past the target time if it is in the middle of a model (which for some unknown reason the call a decoy)

If it does run past the target time it will either terminate when the model it is currently working on completes, or until the "watch dog" wakes up and terminates it. This occurs when you reach a point four hours past the target time.




Thanks for confirming my understanding of the process and the time factors. That was why I went to 6. That should allow it to run to perhaps 10 hours. If it is going longer than that something may be wrong.

Happy cruncher.
ID: 71027 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 71029 - Posted: 11 Aug 2011, 11:36:25 UTC - in response to Message 71024.  
Last modified: 11 Aug 2011, 11:39:12 UTC

Rochester - the tasks which are not completing - are they actually pulling cycles or have they stalled - what does your Performance Monitor (or whatever Windows calls it) currently show.

I also note you are running Windows 7 (unlike most of the Windows users here who have moldy copies of XP) - did you put on any maintenance this past weekend before the issue of non-completing tasks started?

And finally, was whoever it was who gave you the moniker "New York" upset at you and seeking to punish you for something?


i don't think the tasks are even starting only maintenance might have been a auto de-frag i do once at the end of every month

id need more info on the last new york thing
https://boinc.bakerlab.org/rosetta/results.php?hostid=1423271&offset=60
ID: 71029 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 71034 - Posted: 12 Aug 2011, 2:59:11 UTC

Keep an eye out on the new T0xxxxxxxxx tasks.
Another person and me just had 1 each die on us.
He got a validate error and mine crashed and burned 50% of the way through.
ID: 71034 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,860,059
RAC: 7,494
Message 71039 - Posted: 12 Aug 2011, 18:03:57 UTC - in response to Message 71034.  

Keep an eye out on the new T0xxxxxxxxx tasks.
Another person and me just had 1 each die on us.
He got a validate error and mine crashed and burned 50% of the way through.

Same here - have one 50.940% through. BOINC time was increasing but using no CPU time. I've just suspended it and resumed it with no effect. The Time Remaining for it isn't given in BOINC Manager and the graphics close pretty quickly after opening without displaying anything...
ID: 71039 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org