Questions and Answers : Web site : Website status report incorrect
Author | Message |
---|---|
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
According to the website status page at https://boinc.bakerlab.org/rosetta/rah_status.php, everything is hunky dory, but it's quite clear the server is not accepting completed work (from some hours ago). Perhaps this is part of some less focused strangeness that has been going on over the last few days, but if so, then there should be some kind of announcement about the problem, and I can't find that anywhere, either. On the top page, the latest news is more than a month old, and the last-listed tweet is about 4 months old. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Well, the system is still clearly out of order, and the website is still clearly incorrect in its status reports. No finished work being accepted by the Baker Lab side, and no fresh work units coming down. At least 12 hours since my original report here, and it was already several hours of brokenness at that time... Allo? Anyone there? P.S. Even the Twitter account appears to be dead as of March? |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
You may want to keep an eye on the Problems and Technical Issues with Rosetta@home thread. Several users have reported the same problem there and one of the project team members has replied to say they are investigating. krypton wrote: Thanks for the reports!! We are looking at this now. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Okay, now about that website Server Status Page... It is still showing that the project is all green and that is clearly all wrong. Whatever is wrong should be detected and indicated there. It's also nice if you include an estimated repair time. Communication is good, eh? I have to repeat my point, even including the bad joke: If you don't communicate more effectively, you are liable to cause all sorts of rumors to appear. The bad joke was my proposed rumor about an NSA trainee bollixing your website. This is NOT really a funny idea, because a rogue task (AKA work unit) could do ALL sorts of bad things, probably including hijacking your computer's camera to take embarrassing photos. The people running these BOINC projects need to take some responsibility for providing accurate and timely information about the status of their projects. My own observations actually suggest that something has been going south from around the 22nd of July, but it clearly fell off the edge of the earth yesterday (or maybe the day before that). For what it is worth, I suspected the change around the 22nd may have involved some attempt to fix the "Computation Error" tasks. My theory was that they nipped some of the sub-projects that were causing those errors, and the change was causing a significant drop in the statistics. However, because of how poorly the project managers communicate, I wasn't really expecting any clarification from them. Hello, people? We know you aren't NSA professionals, but still... P.S. Perhaps I should feel some personal culpability here, insofar as I may have been an inadvertent contributor to the design of BOINC. However, if I had been asked MUCH more politely to contribute more, then I hope that proper resource security of the client would have been one of my major concerns. Therefore, I disclaim and proclaim "It ain't my fault!" P.P.S. If you delete the joke again, then maybe I'll stop taking the rumor as a joke. We seem to be back again to the need for improved communications skills, eh? P.P.P.S. I'm not really Canadian, though it was an increasingly attraction optional rumor during the Dubya years (of the big dick Cheney). |
Jim J Send message Joined: 21 Feb 14 Posts: 4 Credit: 429,101 RAC: 0 |
I have 19 uploads pending like this: ... Upload: retry in 01:10:53 (project backoff 00:26:40) As shanen says, the status page still reports what appears to be misinformation. The Problems and Technical Issues page reports issues from 2011. Any idea yet on what is going on today? |
Ralph Send message Joined: 10 Jan 12 Posts: 2 Credit: 397,191 RAC: 0 |
I got the same problem Since July 28. Thanks for help. |
Ralph Send message Joined: 10 Jan 12 Posts: 2 Credit: 397,191 RAC: 0 |
I got the same problem Since July 28. Thanks for help. |
Jim Mowbray Send message Joined: 22 Oct 06 Posts: 1 Credit: 13,740,583 RAC: 0 |
Same problem for me also. No work units being uploaded and no units received since July 28. |
ThrowerGB Send message Joined: 4 Dec 05 Posts: 3 Credit: 12,259,708 RAC: 0 |
[quote]I have 19 uploads pending like this: ... Upload: retry in 01:10:53 (project backoff 00:26:40) I have the same problem. It's been going on for at least a week now. I have 24 uploads pending and no downloads in my queue. |
Jim J Send message Joined: 21 Feb 14 Posts: 4 Credit: 429,101 RAC: 0 |
I have 19 uploads pending like this: Today 13 downloads arrived for me. Rosetta worked through them and now it is idle again. I have 32 uploads pending. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
The Problems and Technical Issues page reports issues from 2011. Check the link I gave earlier. It is a sticky thread from another part of the forum, so has information from different time periods. The early posts are from 2011 while the latest posts are about this issue. This Q&A section of the forum isn't visited much so you are unlikely to get answers in the short term. Most of the discussion is taking place in the Number Crunching section. |
Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0 |
Strange as it may seem, the Server Status Page is actually correct. If your computer was inside the firewall at the University of Washington, you would not be aware of any issue. The problem is the internet connection from the campus being throttled to the point where uploads and downloads from us are timing out. This is not monitored on the Server Status Page. Based on what I see in the Number Crunching section of the Message Boards, I do not expect any resolution until Monday since the Rosetta staff have been away and I expect the UW IT staff will only be back on Monday. In addition, that's Pacific Time so it will likely only be about 15h00 UTC before resolution can be expected. On top of this, when resolved, the routers and switches will get hammered with data. Warped |
Jim J Send message Joined: 21 Feb 14 Posts: 4 Credit: 429,101 RAC: 0 |
...The early posts are from 2011 while the latest posts are about this issue. I later saw that - thanks! |
Jim J Send message Joined: 21 Feb 14 Posts: 4 Credit: 429,101 RAC: 0 |
Well my CPU rate shot up a while ago and I found Rosetta was working through 19 new downloads. The 32 completed tasks had been uploaded! Maybe things are settling down... |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Problem now diagnosed: krypton wrote: Some good news and bad news: David E K wrote: Yep, I'm currently optimizing the number of connections on all our servers. Looks like they can keep up without too much load/memory usage so far. These servers are pretty old and I'm sure we'll upgrade soon hopefully. Polian wrote: Looks like a DoS attack to me, to be honest: David E K wrote: Hmm, I also suspected this but the IP's from the logs were coming from various places. Maybe I'll have to disable new users for now until we figure things out. David E K wrote: I was told by Matthew Blumberg at Gridrepublic that the new users are real crunchers and that they "started a new marketing campaign via charityengine.com." So I re-enabled the account creation for these users. Our servers may get sluggish again but hopefully things will settle down as the new user rates decrease. And hopefully optimizing the connections on our servers will help. In the future, we hope to get more servers. |
Questions and Answers :
Web site :
Website status report incorrect
©2024 University of Washington
https://www.bakerlab.org