Message boards : Number crunching : SERVER PROBLEMS - 2.
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Jochen - A hex on you - The Great State of Washington not California. You'd better watch your back after faux pas like that. Despite the fact that they are the home of Microsoft, the people of Washington have a lot of pride! |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Jochen - Shame on me! But I live far far away in Germany. So it'll take some time before I'm hit by a wild stab in the in the dark. I hope. ;) But at least it's the same time zone (PST), isn't it? |
TimL Send message Joined: 16 Sep 06 Posts: 17 Credit: 15,509,973 RAC: 0 |
I'm afraid I am already crunching malariacontrol.net jobs Hope you get well soon Rosie. |
![]() ![]() Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Jochen said ... But at least it's the same time zone (PST), isn't it? Yep, you are right about the time zone. The big difference is that engineers, scientists and those with a good education tend to migrate North, while the more "unique and creative" free spirits of the world tend to settle in California. |
Warped Send message Joined: 15 Jan 06 Posts: 48 Credit: 1,788,185 RAC: 0 |
Is there still no news on when we can expect the servers to be back on line? I am unable to upload completed work. Warped ![]() |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Warped said ... Nope. Warped said ... Just as all of us. :( I guess, the Rosetta crew is working hard, to fix the problem. Just let them do their work. Once they have time, they'll probably post what happened. Chris said ... I see. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2251 Credit: 42,703,552 RAC: 22,709 ![]() |
Warped said ... All systems go from about 90minutes ago. ULs and DLs here now after a bit of WCG crunching in the meantime. Validation seems to be backed up but I'm sure that'll clear up soon. Panic over. ![]() ![]() |
![]() ![]() Send message Joined: 21 Apr 10 Posts: 19 Credit: 17,915,923 RAC: 0 |
Hi, My workstation managed to download some data and is crunching now but renderfarm was switched off and Rosetta servers are down again so there is no new data to work on for most of my computers. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Hi, I didn't even notice that outage. Anyway, all servers are running again. I don't have problems getting new work. Only the validator seems to be a bit behind. |
jesse1919 Send message Joined: 1 Jul 10 Posts: 8 Credit: 2,680,869 RAC: 0 |
Well, like everyone else I had some down time Friday when the server couldn't feed my machines. I thought I'd be smart and increase the desired work time and additional work queue both to 24 hours. That should give them more time to fix another server crash without downtime. Makes sense? BUT one of my 24 hour WUs gave a validate error so it was a waste. I never saw this before. Did anybody else have validate errors since Friday? I was also messing with overclocking this machine yesterday. I believe it's rock solid now and other WUs were fine. Is a validate error always a server problem or could it be client side? |
mikey![]() Send message Joined: 5 Jan 06 Posts: 1896 Credit: 10,178,569 RAC: 7,216 ![]() |
Hi, This is where a backup project can come in handy, pick one and give it a very low percentage and it will basically only crunch when Rosetta is down. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
I thought I'd be smart and increase the desired work time and additional work queue both to 24 hours. That should give them more time to fix another server crash without downtime. Makes sense? Yes, but there could be server outages, that might last longer than 24 hours. ;) BUT one of my 24 hour WUs gave a validate error so it was a waste. This is usually just a server problem. From the scientific view it might be waste, but the credits were granted. Just have a look at the taks's details. cu Joe |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Is there a problem again the flops count has been dropping & i'm getting these messages now plus not many tasks lined up & the validator is slow as well. Ready to send__672 Wed 25 Aug 2010 16:32:41 EST|rosetta@home|Sending scheduler request: To fetch work. Requesting 13618 seconds of work, reporting 0 completed tasks Wed 25 Aug 2010 16:32:46 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks ![]() |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Yes, it looks like Rosetta ran out of work. I'm not getting any new work either. 2 hours ago there were 10 WU ready to send, now there are 700 WUs ready to send. Just not enough to feed us. ;) cu Jochen |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
Why can't I edit my last post? Is there a time limit? Anyway, I was able to get some WUs on two computers, but the third is idling for hours now... The number of WUs ready to send is slowly increasing, so they probably already have fixed the problem. It'll just take a while, to get back to normal operations, since the servers are probably flooded with requests. cu Joe [Edit] Too bad, available WUs have dropped to almost 0 againg... |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes there is a time limit, I believe it is one hour, on editing your own posts. No project will have work and servers and networks and file servers with 100% availability. This is part of why BOINC allows you to attach to multiple projects, configure a cache of work to keep on-hand, and now the newer releases allow you to configure a "backup project" by attaching to a project and setting a zero resource share. (I seek to educate here, not make excuses. I don't run the servers, so I'm not defensive about it. R@h is one project with very high up-time, but there is always room for improvements. On the other hand, how can you implement improvements without taking the servers down once and a while?) Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2251 Credit: 42,703,552 RAC: 22,709 ![]() |
Is there a problem again the flops count has been dropping & I'm getting these messages now plus not many tasks lined up & the validator is slow as well. Ditto. It's lucky I increased my cache a few weeks ago after CASP9 ended. A brief word from the admins on timescales would be reassuring. ![]() ![]() |
goraxan Send message Joined: 18 Jul 10 Posts: 6 Credit: 1,143,926 RAC: 0 |
I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool. With your computers hidden, one needs to guess... What is your cache size? What is your preferred running time? @ModSense: It's too late, trying to educate me. ;) I guess I will just increase my cache again. It was just comfortable with a small cache, since I quite frequently reinstall the OS, due to hardware changes. Just let BOINC run out of work over night... cu Joe |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool. The amount of work you keep on-hand is configured in the network preferences of the BOINC Manager (or via the project website, and then you can update to the project and use the same settings on several machines). Also, the amount of free space on your hard drive is not really relevant. What is important is the amount that BOINC is allowed to use. This is configured in the disk and memory tab of the preferences. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
SERVER PROBLEMS - 2.
©2025 University of Washington
https://www.bakerlab.org