Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 55 · Next

AuthorMessage
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70864 - Posted: 2 Aug 2011, 21:20:23 UTC
Last modified: 2 Aug 2011, 21:31:42 UTC

Hi everyone. I joined today. I added Rosetta to my Seti workload.

Joined around noon. 5 hours later, no work units. Not an issue as I have plenty of Seti WU to keep the computer crunching, but was wondering if it was just me.

I see from the notes below that I am not the only one.

Curious, how long does a typical Rosetta work unit run. I realize it varies by computer but give me a WAG. Seti WU vary from about 1 hour to 15 hours. Then there are the Astropulse WU that run 50 to 90 hours estimated run time.


I will keep crunching Seti until some Rosetta stuff shows up. Seti has plenty of work. I have a 3-4 day batch of WU at all times. This way if they have a day or two outage/gap, I just keep working.


If your systems are idle, add Seti@home to your project list. If Rosetta is your primary interest than give it the lion's share of your resources. But if you run out of Rosetta work, the other projects will simply get more time on your CPUs. :-D

I have mine set for 66% for Seti and 33% for Rosetta. But right now Seti is running 100% of the time because that is all I have to work on.

Don't worry, be happy! ;-)


Ed
ID: 70864 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Henry Bundy

Send message
Joined: 22 Apr 11
Posts: 1
Credit: 2,939,132
RAC: 0
Message 70867 - Posted: 2 Aug 2011, 22:36:02 UTC

Obviously a major hack of all Boinc projects. No work available from anybody. Track this person or persons down and shoot them in the head.
ID: 70867 · Rating: 0 · rate: Rate + / Rate - Report as offensive
HiFiTubeGuy
Avatar

Send message
Joined: 12 Jan 10
Posts: 22
Credit: 6,291,999
RAC: 0
Message 70868 - Posted: 2 Aug 2011, 22:49:02 UTC - in response to Message 70864.  
Last modified: 2 Aug 2011, 22:58:07 UTC


Curious, how long does a typical Rosetta work unit run. I realize it varies by computer but give me a WAG. Seti WU vary from about 1 hour to 15 hours. Then there are the Astropulse WU that run 50 to 90 hours estimated run time.

Ed



Welcome to R@H!

Most of mine (Rosetta WU's) take 2.5 to 3 hours; on a Core i7 860 OC'd @ 3.61GHz.
ID: 70868 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,598,186
RAC: 60,148
Message 70869 - Posted: 2 Aug 2011, 23:32:35 UTC - in response to Message 70868.  


Curious, how long does a typical Rosetta work unit run. I realize it varies by computer but give me a WAG. Seti WU vary from about 1 hour to 15 hours. Then there are the Astropulse WU that run 50 to 90 hours estimated run time.

Ed



Welcome to R@H!

Most of mine (Rosetta WU's) take 2.5 to 3 hours; on a Core i7 860 OC'd @ 3.61GHz.


Even on much slower computers they might take a similar amount of time, but a faster computer will pack more models into a work unit in the same amount of time (and will therefore get more credits for that work). You can adjust that duration in the Rosetta Preferences but it isn't usually necessary ;)
ID: 70869 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70871 - Posted: 3 Aug 2011, 1:49:33 UTC
Last modified: 3 Aug 2011, 1:51:23 UTC

Thanks for the feedback on WU time.

Has Rosetta been down for a while?

Don't you guys maintain several days of WU on your systems to cover short term outages?

I have the preferences set to hold 3 days of WU on my system. That is usually enough.
ID: 70871 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,997
RAC: 2,245
Message 70873 - Posted: 3 Aug 2011, 3:09:49 UTC - in response to Message 70862.  

I do not understand what is causing these errors


4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Compute error 0.44 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over
Client error Compute error 0 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Compute error 0.48 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Downloading 0 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Compute error 0 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Downloading 0 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Compute error 0.36 0 ---
4.4E+08 4.01E+08 2 Aug 2011 12:03:34 UTC 2 Aug 2011 17:34:27 UTC Over Client error Downloading 0 0 ---
4.4E+08 4.01E+08 2 Aug 2011 11:44:35 UTC 2 Aug 2011 17:34:27 UTC Over Client error Compute error 8.28 0.03 ---
4.39E+08 4.01E+08 2 Aug 2011 5:19:23 UTC 2 Aug 2011 7:07:14 UTC Over Client error Compute error 3.96 0.01

Do you have any suggestions?



You have alot of errors going on on the one machine.
I would suggest you abort all tasks by R@H if you have any still running (there are some that are "downloading"). Remove the project (via the projects tab in boinc mgr) and then add R@H back. Also go to the data section of the boinc folder and find and delete all R@H files. Then you will have a clean start.
Also if your OC'd you might want to lower your speed a little bit. The error listed is hard to find a specific answer for. Maybe the others have some better ideas.
ID: 70873 · Rating: 0 · rate: Rate + / Rate - Report as offensive
HiFiTubeGuy
Avatar

Send message
Joined: 12 Jan 10
Posts: 22
Credit: 6,291,999
RAC: 0
Message 70874 - Posted: 3 Aug 2011, 3:29:37 UTC - in response to Message 70871.  

Thanks for the feedback on WU time.

Has Rosetta been down for a while?

Don't you guys maintain several days of WU on your systems to cover short term outages?

I have the preferences set to hold 3 days of WU on my system. That is usually enough.


It's not that Rosetta has actually been 'down' really, there's just been a temporary shortage of work: see https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5701&nowrap=true#70834 .

Yep, I set my work buffer at 4 days. Maybe I'm just lucky, as I've managed to get just enough work each day through this shortage (since July 26th for me) to keep my computers busy, and apparently some people have gotten no work at all at times.
ID: 70874 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,817
RAC: 2,607
Message 70875 - Posted: 3 Aug 2011, 3:29:57 UTC - in response to Message 70864.  

Hi everyone. I joined today. I added Rosetta to my Seti workload.

Joined around noon. 5 hours later, no work units. Not an issue as I have plenty of Seti WU to keep the computer crunching, but was wondering if it was just me.
No, it's not just you, there is going on since Friday.
Apparently, by a post by Rocco, this was known to them that they would run out of WU's and didn't bother to post an official notification. And it took some complaints to even get that post from Rocco two days later. A few WU's seem to drop in once in a while now, but by far not back to normal.
The powers to be at R@H don't seem to think it's necessary to tell anyone what's going on and for how long...

Ralf
ID: 70875 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70878 - Posted: 3 Aug 2011, 13:47:58 UTC
Last modified: 3 Aug 2011, 13:48:46 UTC

Thanks Ralf. Seems a good reason to have several projects running. Chances are there will always be work that way, for those who want to keep their systems busy.

I can wait. I have a couple hundred hours of Seti and Astropulse work units to crunch.
ID: 70878 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 70879 - Posted: 3 Aug 2011, 17:03:24 UTC

Wanders in, pokes the Bakerlab people in the hope of getting an official comment on this, and wanders out again.

I'm hit as hard as others, maybe worse. I reset Rosetta on three of my systems in response to a comment on the BOINC console that recommended doing so. Something to do with an error that I now don't remember, but it explicitly said to reset if the error happens a lot. Which it was.

So I've got three machines that are dead in the water atm.

I know that the BOINC devs themselves (i.e. the folks at Berkeley) don't read these boards, but you know what I'd like to see in the project scheduling / priority section, is an option like this.

Get WU for project X 100% of the time, unless project X has no work,
in which case get WU for project Y, but only until Project X comes back.

That way I could set Rosetta to be there 100% of the time when it's up, but fall back gracefully to something else if I have to cover an outage.


I used to try this by setting Rosetta to 99% and something else to 1%, but that wide disparity in work percentages appeared to cook the mind of the scheduler, and it really didn't work the way I wanted.
ID: 70879 · Rating: 0 · rate: Rate + / Rate - Report as offensive
AtHomer
Avatar

Send message
Joined: 26 Jan 10
Posts: 13
Credit: 7,145,229
RAC: 0
Message 70880 - Posted: 3 Aug 2011, 18:22:16 UTC - in response to Message 70879.  
Last modified: 3 Aug 2011, 18:24:47 UTC

Wanders in, pokes the Bakerlab people in the hope of getting an official comment on this, and wanders out again.

I'm hit as hard as others, maybe worse. I reset Rosetta on three of my systems in response to a comment on the BOINC console that recommended doing so. Something to do with an error that I now don't remember, but it explicitly said to reset if the error happens a lot. Which it was.

So I've got three machines that are dead in the water atm.

I know that the BOINC devs themselves (i.e. the folks at Berkeley) don't read these boards, but you know what I'd like to see in the project scheduling / priority section, is an option like this.

Get WU for project X 100% of the time, unless project X has no work,
in which case get WU for project Y, but only until Project X comes back.

That way I could set Rosetta to be there 100% of the time when it's up, but fall back gracefully to something else if I have to cover an outage.


I used to try this by setting Rosetta to 99% and something else to 1%, but that wide disparity in work percentages appeared to cook the mind of the scheduler, and it really didn't work the way I wanted.

I set Rosetta shared sources to 100, and another project to 0. This works perfectly!

The scheduler tries to get new WUs from Rosetta, but if it does not succeed it will keep trying until there is no work left to do, and only then will it go and get work from project "0".
ID: 70880 · Rating: 0 · rate: Rate + / Rate - Report as offensive
dango

Send message
Joined: 22 Dec 08
Posts: 3
Credit: 75,820
RAC: 0
Message 70881 - Posted: 3 Aug 2011, 20:21:04 UTC

WUs available again!!

just received first 33

Ready to send 39,774
In progress 501,766
ID: 70881 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ed

Send message
Joined: 2 Aug 11
Posts: 31
Credit: 662,563
RAC: 0
Message 70882 - Posted: 3 Aug 2011, 21:38:58 UTC - in response to Message 70878.  

Guys, I just got my first Rosetta WU, so I am now offically part of the project. :-)
ID: 70882 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 70884 - Posted: 3 Aug 2011, 21:43:04 UTC - in response to Message 70882.  

Guys, I just got my first Rosetta WU, so I am now offically part of the project. :-)


Welcome .....thanks for joining
ID: 70884 · Rating: 0 · rate: Rate + / Rate - Report as offensive
LMacNeill

Send message
Joined: 17 Sep 07
Posts: 3
Credit: 14,773,560
RAC: 0
Message 70886 - Posted: 3 Aug 2011, 23:20:44 UTC - in response to Message 70881.  

WUs available again!!


I have yet to receive any... I assume it's just a matter of time? I don't need to take any steps like resetting the project or anything, do I?

Thanks,
Laurence MacNeill
Mableton, Georgia, USA
ID: 70886 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,704,997
RAC: 2,245
Message 70889 - Posted: 4 Aug 2011, 2:39:34 UTC

The project is out of work again.
Just checked the server status.
Only 1 task ready to go.
ID: 70889 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,817
RAC: 2,607
Message 70891 - Posted: 4 Aug 2011, 4:12:34 UTC - in response to Message 70889.  

The project is out of work again.
Just checked the server status.
Only 1 task ready to go.
There hasn't really been any "work" since Friday, for almost 6 days now, I got maybe (half) a dozen WU's since then, on 6 or 7 hosts running R@H, probably some cleanup jobs...

This picture from the accumulated WU's at the "BOINC Combined Stats" site shows how much this has dropped since...

Again, my complain is not that there currently barely WU's available but that the administrators for R@H don't have the guts/decency to tell (upfront) what the deal is/was.
Looked at first like a typical "Friday afternoon server barf with no sysadmin around" until Rocco posted two days later that they knew they would run out of jobs in the current project(s). Other projects like WGC announce those kind of things weeks in advance...

And the fact that critical posts now even get censored in here shows that there isn't much hope of anything changing anytime soon... :(

Ralf
ID: 70891 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 70893 - Posted: 4 Aug 2011, 5:23:30 UTC

@TCPBE -

Right up front I will say that during the year and a half that I have participated in the Rosetta, the developers and SysAdmins have set the bar for communications pretty low, and then consistently miss the mark.

Now I take a different position than my friends in the “Bangers and Mash” crowd who espouse the viewpoint that we, as volunteers, are due nothing, and we really don't have a leg to stand on when it comes to complaining.

It is my position that each and every one of us that regularly contribute cycles to the effort are full partners in the project. Unpaid partners to be sure, but partners none the less.

While there may be some among us who look at the collection of BIONC credits as a sport, I believe that the vast majority of those participate view themselves as members of a team working towards a common goal. And as members of the team we are due certain things:

1. We should have the expectation that the resources we contribute are used in a responsible manner and are expended towards meeting the stated goals of the project. I have no doubt that the researchers at the Rosetta project are sterling in this area.

2. As partners in this endeavor we are entitled to be kept in the loop when it comes to issues that effect us such as server failures and interruptions in the expected flow of work units. Unfortunately Rosetta's track record in this area is abysmal to say the least.

However, there is no expectation that Rosetta is a “full employment” program for our systems. It is expected that from time to time there will be a pause in the work flow. Just keep us informed.

This lack of communication actually caused me to withdraw from the project for a period of time. I believed then, as I believe now, that failing to keep us volunteer partners informed in a timely manner was not an option.

However, I have ** NEVER ** seen a post censored unless it was blatantly vulgar or contained a personal attack. And there were moments when I was pretty blunt with the moderator Mod.Sense (who at moments I referred to as Non.Sense)

If you really think that you had a critical post “censored” it is likely you crossed the line when it came to decorum or personal attacks.






ID: 70893 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,817
RAC: 2,607
Message 70894 - Posted: 4 Aug 2011, 7:23:54 UTC - in response to Message 70893.  

@TCPBE -
"F" ;-)
Right up front I will say that during the year and a half that I have participated in the Rosetta, the developers and SysAdmins have set the bar for communications pretty low, and then consistently miss the mark.

Now I take a different position than my friends in the “Bangers and Mash” crowd who espouse the viewpoint that we, as volunteers, are due nothing, and we really don't have a leg to stand on when it comes to complaining.

It is my position that each and every one of us that regularly contribute cycles to the effort are full partners in the project. Unpaid partners to be sure, but partners none the less.
That's more or less my POV as well. And I don't think that it is too outlandish too ask for a better communication from the side of "project organizers", specially when such an "outage" was apparently known before hand as Rocco eluded to in his post two days into "the (non)event"...
However, there is no expectation that Rosetta is a “full employment” program for our systems. It is expected that from time to time there will be a pause in the work flow. Just keep us informed.
Eggsacktly...
This lack of communication actually caused me to withdraw from the project for a period of time. I believed then, as I believe now, that failing to keep us volunteer partners informed in a timely manner was not an option.
Well, for the time being, I have held off from disconnecting from the project. I don't run any GPUs and don't see a point to act as one of those "credit mongers", I just see a possibility to participate in some potentially useful research by providing whatever spare processing power I can provide with my hosts...
So while R@H is in hibernation in the midst of summer, my hosts are still busy with working for WGC projects for a 100% of they available resource instead of the usual 50/50 split...
However, I have ** NEVER ** seen a post censored unless it was blatantly vulgar or contained a personal attack. And there were moments when I was pretty blunt with the moderator Mod.Sense (who at moments I referred to as Non.Sense)
...
If you really think that you had a critical post “censored” it is likely you crossed the line when it came to decorum or personal attacks.
Certainly not...

I am a forum admin in an Open Source software forum myself, so I think I know where "the line" is. And certainly did I not personally attack anyone, just expressed my frustration with the lackluster response (or better the complete lack thereof) from the project admins...
The post ID is 70835 and it seemed to me rather that this was a misguided attempt to try and quell some opposition to the preferred "head in the sand" attitude...

Ralf
ID: 70894 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 70895 - Posted: 4 Aug 2011, 7:39:57 UTC

Alright ladies, it seems the servers have WUs again, so now less arguing and more crunching, eh?
ID: 70895 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org