Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 55 · Next
Author | Message |
---|---|
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 3,531 |
I just disabled new users from charityengine until our servers can catch up with download demand. Thanks. Server just responded correctly. |
Miklos M Send message Joined: 8 Dec 13 Posts: 29 Credit: 5,277,251 RAC: 0 |
Can you post a log? .Computation errors here too. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Miklos,M wrote: .Computation errors here too. Your errors are mainly listed as tasks aborted by the user. Did you notice any unusual behaviour? Of the ones I checked, the reassigned tasks are either still in progress or have been completed successfully by the next user. I noticed one error for a pd1 graftsheet task that had been reassigned to you but that batch of tasks was failing for almost everyone. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
We increased the number of allowed concurrent users, but the number of max_connections (for mySQL database) remains at 800 users. Looking at the processlist I noticed that most of the users are in "sleep" status waiting for the default 8 hours wait_timeout before being killed. I set the wait_timeout to 30 mins (which still seem rather high, but maybe required for boinc manager?) If anyone sees any database errors in the boinc manager logs, please alert me! |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 3,531 |
8/4/2014 7:50:26 PM | rosetta@home | Started upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0 8/4/2014 7:50:49 PM | rosetta@home | Temporarily failed upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0: connect() failed 8/4/2014 7:50:49 PM | rosetta@home | Backing off 01:25:31 on upload of rb_05_25_47349_92839__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_167016_2278_0_0 8/4/2014 7:50:52 PM | | Project communication failed: attempting access to reference site 8/4/2014 7:50:54 PM | | Internet access OK - project servers may be temporarily down. 8/4/2014 7:57:51 PM | rosetta@home | Sending scheduler request: To fetch work. 8/4/2014 7:57:51 PM | rosetta@home | Requesting new tasks for CPU and NVIDIA 8/4/2014 7:58:14 PM | rosetta@home | Scheduler request failed: Couldn't connect to server 8/4/2014 7:58:17 PM | | Project communication failed: attempting access to reference site 8/4/2014 7:58:19 PM | | Internet access OK - project servers may be temporarily down. 8/4/2014 7:59:59 PM | rosetta@home | Sending scheduler request: To fetch work. 8/4/2014 7:59:59 PM | rosetta@home | Requesting new tasks for CPU and NVIDIA 8/4/2014 8:00:21 PM | rosetta@home | Scheduler request failed: Couldn't connect to server 8/4/2014 8:00:24 PM | | Project communication failed: attempting access to reference site 8/4/2014 8:00:25 PM | | Internet access OK - project servers may be temporarily down. 8/4/2014 8:02:41 PM | rosetta@home | Sending scheduler request: To fetch work. 8/4/2014 8:02:41 PM | rosetta@home | Requesting new tasks for CPU and NVIDIA 8/4/2014 8:03:04 PM | rosetta@home | Scheduler request completed: got 0 new tasks 8/4/2014 8:03:04 PM | rosetta@home | Server can't open database |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
With 30,000 ADDITIONAL new users (after 15,000 the day before), I've been trying to leave the server alone. But I just ran a scheduler request for more work and got this: 8/4/2014 7:46:25 PM | rosetta@home | Scheduler request failed: Failure when receiving data from the peer Rosetta Moderator: Mod.Sense |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 3,531 |
All is well at my end, for now. Finally finished an upload and got new tasks. |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
Thanks googloo and Mod.Sense! All is well at my end, for now. Finally finished an upload and got new tasks. |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,107,661 RAC: 56 |
Since the 3 august i did not have any problem. |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
All working well over here now. I did two things to hopefully help. 1) I set the 'Target CPU run time' in the online Rosetta Preferences page to 6 hours (double the default of 3 hours) which should mean my machines will bug the server for work less often. 2) I also set my local clients to cache a bit more work in case things go crazy again. I encourage others to take similar actions to help ration server resources. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
All working well over here now. I did two things to hopefully help. I would encourage it as well. Just be aware that BOINC Manager will take some time to get used to the new target runtime. I generally suggest that you start with only a small cache of work, and bump the target runtime only a notch or two each day. I'd also suggest going to 12hrs or beyond if it suits the way you use your machine. Gradual change in runtime preference helps avoid BOINC Manager downloading more work than you can complete. The change WILL effect tasks that you've already downloaded once it completes an update to the project with the new preference setting. Hence the suggestion to start when cache of existing work is low. Do not increase your number of days of work to request until AFTER you have run at your final runtime setting for a day or so. Rosetta Moderator: Mod.Sense |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 3,531 |
Not getting work, even though BOINC Manager requests tasks. Server status shows only 32 ready to send. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Right the available work seems to be getting consumed about as quickly as it is being generated. The project is still adjusting to all of the new hosts that have all come at once. Which is a great problem to have! But I've seen on the server status page the actual number of tasks ready to send has been swinging rapidly as new work is generated, and then assigned to hungry hosts. The BOINC Manager will do retries for work and pull some down when work units are available. Rosetta Moderator: Mod.Sense |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 3,531 |
Right the available work seems to be getting consumed about as quickly as it is being generated. The project is still adjusting to all of the new hosts that have all come at once. Which is a great problem to have! But I've seen on the server status page the actual number of tasks ready to send has been swinging rapidly as new work is generated, and then assigned to hungry hosts. Yes, it's been alternating between "no work sent" and actually getting new tasks. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
I've been out of the country for 3 days and things seem to have sorted themselves out in that time - a watched pot never boils - save from some connectivity issues at my end, now resolved. I've just snuck a few tasks for 3 of my 4 machines. The last one should pinch some tomorrow, then I'm back to normal. I doubt WCG will see many further calls for work over the next month with the priorities I've got set. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,142,074 RAC: 2,093 |
Just keep getting "no work sent"... :-( |
krypton Volunteer moderator Project developer Project scientist Send message Joined: 16 Nov 11 Posts: 108 Credit: 2,164,309 RAC: 0 |
I've noticed the same message on my machine. Looking into it. But it's the same pattern as googloo reported. It seems to alternate... Just keep getting "no work sent"... :-( |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
Arghh!! Just as I get home to clear a network issue holding up the upload of 48 tasks, the scheduler's taken offline and I'm full up with 55 WCG tasks instead to plough through. Not having any luck right now :( |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Just checking to see if there is any useful information about the latest problems. Didn't expect to find any, but I could have been surprised. Perhaps they could just reissue old tasks for double-checking? Overall it seems to be another example of the supply of people who want to be helpful being larger than the supply of helpful things for them to do. I feel like starting a list... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Now what is the problem? Everything is disabled!and again no news. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org