Problems and Technical Issues with Rosetta@home

Author	Message
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75109 - Posted: 17 Feb 2013, 22:26:25 UTC - in response to Message 75107. Last modified: 17 Feb 2013, 22:26:37 UTC Link, actually, it does depend on the project -- going 'passive mode' with a number of projects tends to be suboptimal. SETI is notorious about this -- one has to 'encourage' downloads to complete, otherwise they will time out and simply not happen. For them it is an unresolved I/O and scheduler handling issue which they've had for months. Typically (aside from SETI), my involvement is somewhat responsive to the project situation. For example, with Rosetta right now, rather than waste time 'pushing buttons' -- I simply suspend the project and have other projects get the cycles. (As an aside, it never makes sense to me to have a single CPU or a single GPU project active on a workstation). Once Rosetta comes back online (and clears backlogs) - which I would guess would be by Tuesday, I simply take it off of suspend and let life go on. I use that approach with other projects when they are dealing with relatively extended outages (say 12 hours or more). SETI takes a bit of additional handling due to the combination of I/O and scheduler issues, PLUS its weekly Tuesday 4 hour outage and backlog, plus the very high user volumes involved. So in addition to hand holding new work downloads, often, after I have gotten a batch of new work, I go to No New Work there -- that keeps the scheduler happy for handling my uploads and reports. I guess I'm responding to that 'let BOINC do its thing' position as there were folks over on SETI who (over a year ago) insisted that this was the only 'right thing' to do. As SETI is one of the worst projects for doing that, I found that position shall we say, a bit counter intuitive. I'm letting them count down to Project Backoff, and if it Fails, too bad. But if it Reverts to Retry again, ad doesn't work, I'm going to Abort Transfer. That's pretty much the most stupid thing you can do. Best thing to do is to let BOINC do it's job without clicking on any buttons, it's designed to take care such things by itself. Once the servers will work as they should, BOINC will upload all those tasks and eventually request new ones. If your cache is getting empty and you want to do something clever, switch on your backup project(s). ID: 75109 · Rating: 0 · rate: /

BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,845,917 RAC: 0	Message 75110 - Posted: 17 Feb 2013, 22:29:04 UTC - in response to Message 75106. Last modified: 17 Feb 2013, 22:29:12 UTC The thing here is Rosetta uses different servers for their web pages -- which is why we can post here. It is also quite possible that the Rosetta Servers are OK, and that the UW IT folks have made one of their periodic IP address changes over the weekend without prior notification to Baker Labs folks. Server page is all green however... I'll bet I've seen that 'server status' page actually show something as 'down' maybe once over a few years. AFAIK it's rarely if ever accurate or updated. Oh well.[/quote] ID: 75110 · Rating: 0 · rate: /

GALAXY-VOYAGER Send message Joined: 25 Oct 12 Posts: 15 Credit: 47,437 RAC: 0	Message 75114 - Posted: 18 Feb 2013, 6:00:37 UTC - in response to Message 75107. I'm letting them count down to Project Backoff, and if it Fails, too bad. But if it Reverts to Retry again, ad doesn't work, I'm going to Abort Transfer. That's pretty much the most stupid thing you can do. Best thing to do is to let BOINC do it's job without clicking on any buttons, it's designed to take care such things by itself. Once the servers will work as they should, BOINC will upload all those tasks and eventually request new ones. If your cache is getting empty and you want to do something clever, switch on your backup project(s). Yeah, I've had second thoughts. It's just that I'm fed up with things like this happening (especially with R@H). I've decided to let it run it's course. It seems to be working (albeit ever So Slowly, because it's Still Doing The same thing). When you suggest for me to Turn On my back-Up Projects, I assume you mean any other Projects I Run. Actually, I Run 5 Other Projects besides R@H (excluding Orbit@Home, which has never been Active since I Joined it last year), and I Run them all at once. I've Temporarily Suspended ALL Other Projects for 12-24 Hours to see if that helps (including E@H, but allowed One Single Task in E@H to keep running because it has about 39 Hours Estimated Remaining Time): All other Tasks in each Project are not due until March 3rd or after. So, I'll give R@H The Run of The Mill, and see what happens. ID: 75114 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 75121 - Posted: 18 Feb 2013, 12:20:40 UTC - in response to Message 75109. Last modified: 18 Feb 2013, 12:23:18 UTC Link, actually, it does depend on the project -- going 'passive mode' with a number of projects tends to be suboptimal. SETI is notorious about this -- one has to 'encourage' downloads to complete, otherwise they will time out and simply not happen. For them it is an unresolved I/O and scheduler handling issue which they've had for months. (...) I guess I'm responding to that 'let BOINC do its thing' position as there were folks over on SETI who (over a year ago) insisted that this was the only 'right thing' to do. As SETI is one of the worst projects for doing that, I found that position shall we say, a bit counter intuitive. SETI is rather special case, they get simply much more computing power than they can use with their current internet connection. Sure, for a particular cruncher pushing buttons might help, overall it does not (if we neglect, that the more of us push buttons and try to push SETI's internet connection harder, the more packet loss occur). If I download a WU, somebody else doesn't. From the project's view it doesn't matter which machine sits idle, they send as many WUs out as they can (well, actually they try to send a bit more out than they can and hence the download issues). That's why I moved now two of my 3 CPUs to Rosetta, they can apparently use them, for SETI it does not matter, the WUs I'd crunch will be done by another machine, which eventually would sit idle if I got those WUs. So yes, not pushing any buttons is counter intuitive, but only as long as you look at it from your end. From the project's point of view it either does not matter or makes things worse. . ID: 75121 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75122 - Posted: 18 Feb 2013, 12:23:59 UTC - in response to Message 75114. I'm letting them count down to Project Backoff, and if it Fails, too bad. But if it Reverts to Retry again, ad doesn't work, I'm going to Abort Transfer. That's pretty much the most stupid thing you can do. Best thing to do is to let BOINC do it's job without clicking on any buttons, it's designed to take care such things by itself. Once the servers will work as they should, BOINC will upload all those tasks and eventually request new ones. If your cache is getting empty and you want to do something clever, switch on your backup project(s). Yeah, I've had second thoughts. It's just that I'm fed up with things like this happening (especially with R@H). I've decided to let it run it's course. It seems to be working (albeit ever So Slowly, because it's Still Doing The same thing). When you suggest for me to Turn On my back-Up Projects, I assume you mean any other Projects I Run. Actually, I Run 5 Other Projects besides R@H (excluding Orbit@Home, which has never been Active since I Joined it last year), and I Run them all at once. I've Temporarily Suspended ALL Other Projects for 12-24 Hours to see if that helps (including E@H, but allowed One Single Task in E@H to keep running because it has about 39 Hours Estimated Remaining Time): All other Tasks in each Project are not due until March 3rd or after. So, I'll give R@H The Run of The Mill, and see what happens. Rosetta folks take their weekends seriously, THEY DO NOT WORK ON WEEKENDS. That means until they show up for work today, Monday, they have NO CLUE we are having any problems! As long as the fix is not broken or missing hardware it should be back up and running pretty quickly. Obviously broken or missing hardware could take a bit longer. I run Rosie full time on my pc's, no other cpu projects at all. BUT I DO have one project set up at zero percent for instances just like this, when my Rosie work cache ran dry my zero percent project picked right up and I am happily crunching for them. When Rosie is fixed it will go back to standby mode and Rosie will be back to it's 100% mode. ID: 75122 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 75124 - Posted: 18 Feb 2013, 12:36:56 UTC - in response to Message 75114. When you suggest for me to Turn On my back-Up Projects, I assume you mean any other Projects I Run. Actually, I Run 5 Other Projects besides R@H (excluding Orbit@Home, which has never been Active since I Joined it last year), and I Run them all at once. Well, I don't know what and how you run, but you sounded in your previous post like those people over at SETI, who run only SETI and insist that SETI should have availablility like google or so because they don't accept that their CPUs or GPUs are idle. If you run 5 projects all at once you should actually not need to do anything, once the Rosetta servers will be back up, BOINC will upload and report completed tasks and request new ones like nothing happend. . ID: 75124 · Rating: 0 · rate: /

amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0	Message 75125 - Posted: 18 Feb 2013, 14:25:43 UTC - in response to Message 75106. Server page is all green however... I'll bet I've seen that 'server status' page actually show something as 'down' maybe once over a few years. AFAIK it's rarely if ever accurate or updated. Oh well.[/quote] ID: 75125 · Rating: 0 · rate: /

amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0	Message 75126 - Posted: 18 Feb 2013, 14:27:45 UTC - in response to Message 75106. Server page is all green however... I'll bet I've seen that 'server status' page actually show something as 'down' maybe once over a few years. AFAIK it's rarely if ever accurate or updated. Oh well.[/quote] Looks like the server status pages have been updated to 'disabled'. Someone is paying attention. I'm sure the servers will be back up soon. ID: 75126 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75127 - Posted: 18 Feb 2013, 14:44:56 UTC - in response to Message 75122. Last modified: 18 Feb 2013, 14:46:01 UTC Rosetta folks take their weekends seriously, THEY DO NOT WORK ON WEEKENDS. That means until they show up for work today, Monday, they have NO CLUE we are having any problems! As long as the fix is not broken or missing hardware it should be back up and running pretty quickly. Obviously broken or missing hardware could take a bit longer. They know; I sent a PM yesterday (roughly 36 hours ago) morning to the mail address provided by Ethan (to use when there are issues). Greetings, TJ. ID: 75127 · Rating: 0 · rate: /

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 75129 - Posted: 18 Feb 2013, 18:49:25 UTC - in response to Message 75127. Rosetta folks take their weekends seriously, THEY DO NOT WORK ON WEEKENDS. That means until they show up for work today, Monday, they have NO CLUE we are having any problems! As long as the fix is not broken or missing hardware it should be back up and running pretty quickly. Obviously broken or missing hardware could take a bit longer. They know; I sent a PM yesterday (roughly 36 hours ago) morning to the mail address provided by Ethan (to use when there are issues). And yet here we are with a broken something or another with no one bothering to do anything about it. This is so typical of baker labs. I think I will lower my % on Rosie and stick it into Poem or Einstein. I have been with this project for a long time now and it is always the same story. No communication from anyone. ID: 75129 · Rating: 0 · rate: /

TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0	Message 75131 - Posted: 18 Feb 2013, 19:03:44 UTC It is working again! The server page shows all "running". Uploading and reporting is working, but with thousands of WU's the network connections will get "problems". Happy crunching. Greetings, TJ. ID: 75131 · Rating: 0 · rate: /

Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0	Message 75132 - Posted: 18 Feb 2013, 19:03:53 UTC - in response to Message 75127. Last modified: 18 Feb 2013, 19:09:03 UTC Rosetta folks take their weekends seriously, THEY DO NOT WORK ON WEEKENDS. That means until they show up for work today, Monday, they have NO CLUE we are having any problems! As long as the fix is not broken or missing hardware it should be back up and running pretty quickly. Obviously broken or missing hardware could take a bit longer. They know; I sent a PM yesterday (roughly 36 hours ago) morning to the mail address provided by Ethan (to use when there are issues). That's wishful thinking: 1. Monday is a public holiday in the USA. 2. The e-mail address is likely only looked-at when Ethan is at work. I'm expecting at least a further 24 hours without any life from the project. Edit: I am pleasantly surprised. Looks like the router's power supply was switched back on! ID: 75132 · Rating: 0 · rate: /

John_Waters Send message Joined: 23 Jun 11 Posts: 15 Credit: 6,838,499 RAC: 0	Message 75133 - Posted: 18 Feb 2013, 19:12:38 UTC "Rosetta folks take their wekends seriously" We're well into the third day of outage,and apparently they're not working today either. I run six,sometimes seven,8-core machines 24/7 in support of R@H. Electricity alone runs about $200 a month. That's another way of saying may wife's NOT happy. And she doesn't realize that I'm averaging about as much on hardware,software,cables,etc. It's just mind-boggling to me that they would have such a cavalier attitude towards the contributors that no one would be on standby for just this occurence. I have 108 w/u's ready to turn in. Then I'll think about whether I should continue spending all that money. ID: 75133 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 75134 - Posted: 18 Feb 2013, 19:46:31 UTC - in response to Message 75133. It's just mind-boggling to me that they would have such a cavalier attitude towards the contributors that no one would be on standby for just this occurence. I have 108 w/u's ready to turn in. Then I'll think about whether I should continue spending all that money. I always wonder why there are so many people, who don't realize that: 1. BOINC is designed to handle such outages of your favourite project(s) -> see WU cache and the backup project feature. 2. BOINC was made for to use spare cycles of computers that would be running anyway. Nobody ever asked the volunteers to build additional computers just for crunching. 3. BOINC is intended for projects with little money, projects that can't buy the computing power they need. Many of them also can't pay anybody who sits and watches the servers all the time. Considering #1 that would be wasted money anyway, we don't need 24/7 availability. . ID: 75134 · Rating: 0 · rate: /

GALAXY-VOYAGER Send message Joined: 25 Oct 12 Posts: 15 Credit: 47,437 RAC: 0	Message 75138 - Posted: 19 Feb 2013, 4:18:28 UTC - in response to Message 75122. Last modified: 19 Feb 2013, 4:24:04 UTC I'm letting them count down to Project Backoff, and if it Fails, too bad. But if it Reverts to Retry again, ad doesn't work, I'm going to Abort Transfer. That's pretty much the most stupid thing you can do. Best thing to do is to let BOINC do it's job without clicking on any buttons, it's designed to take care such things by itself. Once the servers will work as they should, BOINC will upload all those tasks and eventually request new ones. If your cache is getting empty and you want to do something clever, switch on your backup project(s). Yeah, I've had second thoughts. It's just that I'm fed up with things like this happening (especially with R@H). I've decided to let it run it's course. It seems to be working (albeit ever So Slowly, because it's Still Doing The same thing). When you suggest for me to Turn On my back-Up Projects, I assume you mean any other Projects I Run. Actually, I Run 5 Other Projects besides R@H (excluding Orbit@Home, which has never been Active since I Joined it last year), and I Run them all at once. I've Temporarily Suspended ALL Other Projects for 12-24 Hours to see if that helps (including E@H, but allowed One Single Task in E@H to keep running because it has about 39 Hours Estimated Remaining Time): All other Tasks in each Project are not due until March 3rd or after. So, I'll give R@H The Run of The Mill, and see what happens. Rosetta folks take their weekends seriously, THEY DO NOT WORK ON WEEKENDS. That means until they show up for work today, Monday, they have NO CLUE we are having any problems! As long as the fix is not broken or missing hardware it should be back up and running pretty quickly. Obviously broken or missing hardware could take a bit longer. I run Rosie full time on my pc's, no other cpu projects at all. BUT I DO have one project set up at zero percent for instances just like this, when my Rosie work cache ran dry my zero percent project picked right up and I am happily crunching for them. When Rosie is fixed it will go back to standby mode and Rosie will be back to it's 100% mode. Okay, thanks for the encouragement and information. I didn't realise that it was Totally unattended at any time: I thought that there were Admin/Mods on the job all the time, in the form of a Roster System. Anyway, my latest Idea (mentioned above) has seemed to work. All Completed R@H Tasks have Uploaded. Seems like that could be the Trick: Temporarily Suspend Other Projects if it is practical to do so, and let Rosie Run The Flower store ...LOL :-) Happy Crunching, (ps .. I have now Resumed All Projects) ID: 75138 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 75141 - Posted: 19 Feb 2013, 10:04:00 UTC Are there any sort of cricket graphs available for the Rosetta servers like the ones we have at SETI? That has proven to be very useful when someone wants to know what the servers are doing even if the server status page isn't working. . ID: 75141 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75142 - Posted: 19 Feb 2013, 12:07:21 UTC - in response to Message 75126. Server page is all green however... I'll bet I've seen that 'server status' page actually show something as 'down' maybe once over a few years. AFAIK it's rarely if ever accurate or updated. Oh well. Looks like the server status pages have been updated to 'disabled'. Someone is paying attention. I'm sure the servers will be back up soon. [/quote] One thing to remember is almost ALL webpages are cached and only refreshed on a periodic basis, some by the minute, some by the hour and some by the day or week. There is no way to tell unless you check and notice a change, or they come out and actually tell us. ID: 75142 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75143 - Posted: 19 Feb 2013, 12:14:36 UTC - in response to Message 75133. "Rosetta folks take their wekends seriously" We're well into the third day of outage,and apparently they're not working today either. I run six,sometimes seven,8-core machines 24/7 in support of R@H. Electricity alone runs about $200 a month. That's another way of saying may wife's NOT happy. And she doesn't realize that I'm averaging about as much on hardware,software,cables,etc. It's just mind-boggling to me that they would have such a cavalier attitude towards the contributors that no one would be on standby for just this occurence. I have 108 w/u's ready to turn in. Then I'll think about whether I should continue spending all that money. One thing to remember like in real estate...location, location, location! Rosetta is run out of the University of Washington, as in Washington STATE!! That means West Coast time, when I posted it was 7AM East Coast time and they were probably still snoring logs! They didn't even get up and think about work until I was getting ready for my LUNCH!! When I was active at Seti, MANY years ago which is based in California, the EARLIEST folks wouldn't even come to work before 9am, and MOST of them it was more like NOON!!! I was getting rid to go home and they were just coming to work!! So remember location, location, location, it is VERY important when thinking of WHEN things will be done! ID: 75143 · Rating: 0 · rate: /

Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0	Message 75145 - Posted: 19 Feb 2013, 12:37:47 UTC - in response to Message 75142. Last modified: 19 Feb 2013, 12:44:17 UTC One thing to remember is almost ALL webpages are cached and only refreshed on a periodic basis, some by the minute, some by the hour and some by the day or week. There is no way to tell unless you check and notice a change, or they come out and actually tell us. It's written on the server status when it was updated the last time. . ID: 75145 · Rating: 0 · rate: /

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 0	Message 75148 - Posted: 20 Feb 2013, 12:12:25 UTC - in response to Message 75145. One thing to remember is almost ALL webpages are cached and only refreshed on a periodic basis, some by the minute, some by the hour and some by the day or week. There is no way to tell unless you check and notice a change, or they come out and actually tell us. It's written on the server status when it was updated the last time. Those WORDS were in the way, THANKS now I can SEE!!! ID: 75148 · Rating: 0 · rate: /