SERVER PROBLEMS - 2.

Message boards : Number crunching : SERVER PROBLEMS - 2.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67194 - Posted: 13 Aug 2010, 12:24:24 UTC

Jochen -

A hex on you - The Great State of Washington not California. You'd better watch your back after faux pas like that. Despite the fact that they are the home of Microsoft, the people of Washington have a lot of pride!
ID: 67194 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67196 - Posted: 13 Aug 2010, 12:33:22 UTC - in response to Message 67194.  
Last modified: 13 Aug 2010, 12:34:04 UTC

Jochen -

A hex on you - The Great State of Washington not California. You'd better watch your back after faux pas like that. Despite the fact that they are the home of Microsoft, the people of Washington have a lot of pride!


Shame on me! But I live far far away in Germany. So it'll take some time before I'm hit by a wild stab in the in the dark. I hope. ;)

But at least it's the same time zone (PST), isn't it?
ID: 67196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TimL

Send message
Joined: 16 Sep 06
Posts: 17
Credit: 15,509,973
RAC: 0
Message 67197 - Posted: 13 Aug 2010, 12:37:06 UTC

I'm afraid I am already crunching malariacontrol.net jobs

Hope you get well soon Rosie.
ID: 67197 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67198 - Posted: 13 Aug 2010, 12:48:06 UTC

Jochen said ...

But at least it's the same time zone (PST), isn't it?


Yep, you are right about the time zone. The big difference is that engineers, scientists and those with a good education tend to migrate North, while the more "unique and creative" free spirits of the world tend to settle in California.

ID: 67198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Warped

Send message
Joined: 15 Jan 06
Posts: 48
Credit: 1,788,185
RAC: 0
Message 67200 - Posted: 13 Aug 2010, 14:44:03 UTC

Is there still no news on when we can expect the servers to be back on line?

I am unable to upload completed work.
Warped

ID: 67200 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67202 - Posted: 13 Aug 2010, 17:56:19 UTC

Warped said ...

Is there still no news on when we can expect the servers to be back on line?

Nope.

Warped said ...

I am unable to upload completed work.

Just as all of us. :(


I guess, the Rosetta crew is working hard, to fix the problem. Just let them do their work. Once they have time, they'll probably post what happened.



Chris said ...

The big difference is that engineers, scientists and those with a good education tend to migrate North, while the more "unique and creative" free spirits of the world tend to settle in California.


I see.




ID: 67202 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2473
Credit: 46,499,576
RAC: 3,223
Message 67207 - Posted: 13 Aug 2010, 23:28:59 UTC - in response to Message 67202.  
Last modified: 13 Aug 2010, 23:29:26 UTC

Warped said ...
I am unable to upload completed work.

Just as all of us. :(

I guess, the Rosetta crew is working hard, to fix the problem. Just let them do their work. Once they have time, they'll probably post what happened.

All systems go from about 90minutes ago. ULs and DLs here now after a bit of WCG crunching in the meantime. Validation seems to be backed up but I'm sure that'll clear up soon.

Panic over.
ID: 67207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile GarageFarm.net
Avatar

Send message
Joined: 21 Apr 10
Posts: 19
Credit: 17,915,923
RAC: 0
Message 67209 - Posted: 14 Aug 2010, 8:03:49 UTC - in response to Message 67207.  

Hi,
My workstation managed to download some data and is crunching now but renderfarm was switched off and Rosetta servers are down again so there is no new data to work on for most of my computers.
ID: 67209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67210 - Posted: 14 Aug 2010, 8:47:48 UTC - in response to Message 67209.  

Hi,
My workstation managed to download some data and is crunching now but renderfarm was switched off and Rosetta servers are down again so there is no new data to work on for most of my computers.


I didn't even notice that outage. Anyway, all servers are running again. I don't have problems getting new work. Only the validator seems to be a bit behind.
ID: 67210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jesse1919

Send message
Joined: 1 Jul 10
Posts: 8
Credit: 2,680,869
RAC: 0
Message 67243 - Posted: 17 Aug 2010, 4:44:34 UTC

Well, like everyone else I had some down time Friday when the server couldn't feed my machines. I thought I'd be smart and increase the desired work time and additional work queue both to 24 hours. That should give them more time to fix another server crash without downtime. Makes sense?

BUT one of my 24 hour WUs gave a validate error so it was a waste. I never saw this before. Did anybody else have validate errors since Friday? I was also messing with overclocking this machine yesterday. I believe it's rock solid now and other WUs were fine. Is a validate error always a server problem or could it be client side?
ID: 67243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1898
Credit: 12,723,752
RAC: 682
Message 67245 - Posted: 17 Aug 2010, 9:26:04 UTC - in response to Message 67209.  

Hi,
My workstation managed to download some data and is crunching now but renderfarm was switched off and Rosetta servers are down again so there is no new data to work on for most of my computers.


This is where a backup project can come in handy, pick one and give it a very low percentage and it will basically only crunch when Rosetta is down.
ID: 67245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67247 - Posted: 17 Aug 2010, 10:03:38 UTC - in response to Message 67243.  

I thought I'd be smart and increase the desired work time and additional work queue both to 24 hours. That should give them more time to fix another server crash without downtime. Makes sense?

Yes, but there could be server outages, that might last longer than 24 hours. ;)

BUT one of my 24 hour WUs gave a validate error so it was a waste.

This is usually just a server problem. From the scientific view it might be waste, but the credits were granted. Just have a look at the taks's details.

cu Joe

ID: 67247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67324 - Posted: 25 Aug 2010, 6:45:24 UTC
Last modified: 25 Aug 2010, 6:46:40 UTC

Hi.

Is there a problem again the flops count has been dropping & i'm getting these messages now plus not many tasks lined up & the validator is slow as well.

Ready to send__672

Wed 25 Aug 2010 16:32:41 EST|rosetta@home|Sending scheduler request: To fetch work. Requesting 13618 seconds of work, reporting 0 completed tasks
Wed 25 Aug 2010 16:32:46 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks
ID: 67324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67325 - Posted: 25 Aug 2010, 8:43:50 UTC

Yes, it looks like Rosetta ran out of work. I'm not getting any new work either.

2 hours ago there were 10 WU ready to send, now there are 700 WUs ready to send. Just not enough to feed us. ;)

cu

Jochen
ID: 67325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67329 - Posted: 25 Aug 2010, 10:13:33 UTC
Last modified: 25 Aug 2010, 10:40:54 UTC

Why can't I edit my last post? Is there a time limit?

Anyway, I was able to get some WUs on two computers, but the third is idling for hours now...

The number of WUs ready to send is slowly increasing, so they probably already have fixed the problem. It'll just take a while, to get back to normal operations, since the servers are probably flooded with requests.

cu

Joe

[Edit]
Too bad, available WUs have dropped to almost 0 againg...
ID: 67329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67333 - Posted: 25 Aug 2010, 13:24:30 UTC

Yes there is a time limit, I believe it is one hour, on editing your own posts.

No project will have work and servers and networks and file servers with 100% availability. This is part of why BOINC allows you to attach to multiple projects, configure a cache of work to keep on-hand, and now the newer releases allow you to configure a "backup project" by attaching to a project and setting a zero resource share.

(I seek to educate here, not make excuses. I don't run the servers, so I'm not defensive about it. R@h is one project with very high up-time, but there is always room for improvements. On the other hand, how can you implement improvements without taking the servers down once and a while?)
Rosetta Moderator: Mod.Sense
ID: 67333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2473
Credit: 46,499,576
RAC: 3,223
Message 67335 - Posted: 25 Aug 2010, 14:00:01 UTC - in response to Message 67324.  

Is there a problem again the flops count has been dropping & I'm getting these messages now plus not many tasks lined up & the validator is slow as well.

Ditto. It's lucky I increased my cache a few weeks ago after CASP9 ended.

A brief word from the admins on timescales would be reassuring.
ID: 67335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
goraxan

Send message
Joined: 18 Jul 10
Posts: 6
Credit: 1,143,926
RAC: 0
Message 67336 - Posted: 25 Aug 2010, 14:04:09 UTC

I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool.
ID: 67336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67343 - Posted: 25 Aug 2010, 18:53:09 UTC - in response to Message 67336.  

I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool.

With your computers hidden, one needs to guess...
What is your cache size? What is your preferred running time?

@ModSense: It's too late, trying to educate me. ;)
I guess I will just increase my cache again. It was just comfortable with a small cache, since I quite frequently reinstall the OS, due to hardware changes. Just let BOINC run out of work over night...

cu Joe



ID: 67343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67344 - Posted: 25 Aug 2010, 18:57:13 UTC - in response to Message 67336.  

I have 30GB of my HD reserved for R@H but it never use more than 350MB aprox. So, I think there's no way to increase the tasks pool.


The amount of work you keep on-hand is configured in the network preferences of the BOINC Manager (or via the project website, and then you can update to the project and use the same settings on several machines).

Also, the amount of free space on your hard drive is not really relevant. What is important is the amount that BOINC is allowed to use. This is configured in the disk and memory tab of the preferences.
Rosetta Moderator: Mod.Sense
ID: 67344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : SERVER PROBLEMS - 2.



©2025 University of Washington
https://www.bakerlab.org