Message boards : Number crunching : Something wrong with Server-Side-Scheduler
Author | Message |
---|---|
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
Hi, there must be something wrong with the server-side-scheduler. I reactivated an old client, attached it to Rosetta, but didn't get work. While I was wondering, what could be the problem, the box contacted Predictor@Home, and ah, I got the info, not enough disk-space-free on the client. I think, the message had although to come from Rosetta ... Below the relevant entrys from message-tab: 12/12/2005 10:29:36|rosetta@home|Successfully attached to rosetta@home 12/12/2005 10:30:28||request_reschedule_cpus: project op 12/12/2005 10:30:31|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi 12/12/2005 10:30:31|LHC@home|Reason: To fetch work ... 12/12/2005 10:30:36|LHC@home|No work from project 12/12/2005 10:30:49||request_reschedule_cpus: project op 12/12/2005 10:30:51||request_reschedule_cpus: project op 12/12/2005 10:30:54||request_reschedule_cpus: project op 12/12/2005 10:31:20||request_reschedule_cpus: project op 12/12/2005 10:31:22|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 10:31:22|rosetta@home|Reason: Requested by user 12/12/2005 10:31:22|rosetta@home|Requesting 86400 seconds of new work 12/12/2005 10:31:27|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 10:34:05||request_reschedule_cpus: project op 12/12/2005 10:34:07|ProteinPredictorAtHome|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi 12/12/2005 10:34:07|ProteinPredictorAtHome|Reason: To fetch work 12/12/2005 10:34:07|ProteinPredictorAtHome|Requesting 86400 seconds of new work 12/12/2005 10:34:17|ProteinPredictorAtHome|Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded 12/12/2005 10:34:17|Predictor @ Home|Message from server: No work sent 12/12/2005 10:34:17|Predictor @ Home|Message from server: (there was work but you don't have enough disk space allocated) 12/12/2005 10:34:17|Predictor @ Home|Message from server: No disk space (YOU must free 323.8 MB before BOINC gets space). Review preferences for minimum disk free space allowed. 12/12/2005 10:34:17|Predictor @ Home|No work from project 12/12/2005 10:35:33|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 10:35:33|rosetta@home|Reason: To fetch work 12/12/2005 10:35:33|rosetta@home|Requesting 86400 seconds of new work 12/12/2005 10:35:38|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 10:35:38|rosetta@home|No work from project 12/12/2005 10:39:44|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 10:39:44|rosetta@home|Reason: To fetch work 12/12/2005 10:39:44|rosetta@home|Requesting 86400 seconds of new work 12/12/2005 10:39:54|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 10:39:55|rosetta@home|Started download of rosetta_4.80_windows_intelx86.exe ... Supporting BOINC, a great concept ! |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
Follow up: I attached one more box; it didn't get work again ! This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%. 12/12/2005 11:17:01|rosetta@home|Successfully attached to rosetta@home 12/12/2005 11:17:19||request_reschedule_cpus: project op 12/12/2005 11:17:22|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 11:17:22|rosetta@home|Reason: Requested by user 12/12/2005 11:17:22|rosetta@home|Requesting 172800 seconds of new work 12/12/2005 11:17:27|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 11:18:45||request_reschedule_cpus: project op 12/12/2005 11:18:47|LHC@home|Sending scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi 12/12/2005 11:18:47|LHC@home|Reason: To fetch work 12/12/2005 11:18:47|LHC@home|Requesting 172800 seconds of new work 12/12/2005 11:18:52|LHC@home|Scheduler request to http://lhcathome-sched1.cern.ch/scheduler/cgi succeeded 12/12/2005 11:18:52|LHC@home|No work from project 12/12/2005 11:19:07||request_reschedule_cpus: project op 12/12/2005 11:19:07|ProteinPredictorAtHome|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi 12/12/2005 11:19:07|ProteinPredictorAtHome|Reason: To fetch work 12/12/2005 11:19:07|ProteinPredictorAtHome|Requesting 172800 seconds of new work 12/12/2005 11:19:17|ProteinPredictorAtHome|Scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi succeeded 12/12/2005 11:19:18|Predictor @ Home|Started download of bprion_4_68243.ini ... Supporting BOINC, a great concept ! |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%. And what are the "Disk and memory usage" settings? If they are lower than what the projects need (taking into account what you might already have allocated), BOINC won't download more work as it thinks there won't be enough space. As it says in one of the message logs you posted: "Message from server: No disk space (YOU must free 323.8 MB before BOINC gets space). Review preferences for minimum disk free space allowed." You may have (for example) 30GB free, but if for instance BOINC settings are to use no more than 1% of space, 323.8MB isn't going to fit in the space "reserved" for BOINC. *** Join BOINC@Australia today *** |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
This box has enough disk space free; the ResourceShare is configured, that Rosetta gets nearly 49%, Predictor@Home only 1%, LHC@Home nearly 49%. The box has round about 50 GB free; it downloaded immediatly WUs from Predictor; meanwhile it has also work from Rosetta [Edit]The logs are from two different boxes; one had really problems with disk space, the other not ![/Edit] Supporting BOINC, a great concept ! |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 13 |
The newer V5 versions of BOINC calculate disk space a bit differently, and are more 'strict' than the V4's. Generally the problem is in the "leave x% free" setting - IF there really is a problem. Sometimes simply going to that part of the preferences on the website and changing "something", and changing it right back, will cause BOINC to start working - the actual defaults when no change has been made apparently don't match the defaults displayed. It's a BOINC-level problem, it hits all the projects equally. So... my advice is to go to the preferences page and increase the % allowed by a bit, and if that doesn't solve it, increase the GB allowed by a fraction and try again. It's asking for a bit over 300MB, but I doubt it'll actually use anywhere near that. Looking at the log, I think it _is_ for Predictor that you'll need to do this, not Rosetta. The message of "no work from project" from Rosetta implies none is being sent out at present, not that you don't have room for it; this generally is temporary, lasting only a few minutes, I've seen it myself, regardless of what the server status page shows as ready to send. |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
The newer V5 versions of BOINC calculate disk space a bit differently, and are more 'strict' than the V4's. Generally the problem is in the "leave x% free" setting - IF there really is a problem. Sometimes simply going to that part of the preferences on the website and changing "something", and changing it right back, will cause BOINC to start working - the actual defaults when no change has been made apparently don't match the defaults displayed. It's a BOINC-level problem, it hits all the projects equally. You are right, sometimes there are problems like this, but this can't be the problem here, because Predictor was succesfully downloaded and nothing from Rosetta (for the first step). The message of "no work from project" from Rosetta implies none is being sent out at present, not that you don't have room for it; this generally is temporary, lasting only a few minutes, I've seen it myself, regardless of what the server status page shows as ready to send. I don't have a real problem with getting no work from Rosetta; I have attached enough projects, so my clients always have something todo. But if at the end of this week, when FAD finally closes and a lot of people coming over, there will be a problem, if they are only attached to Rosetta ... Supporting BOINC, a great concept ! |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 13 |
...this generally is temporary, lasting only a few minutes... BOINC will retry, if it doesn't get work - and if it's attached to only one project, it will retry quite often. I keep my cache setting at 0.25 to 0.5 on my systems, much lower than many do, and I have _never_ run out of Rosetta work. The only reason I know I've ever gotten the message at all is that I tend to look at my Messages tab a lot, and I've seen "no work from project" followed just a few minutes later by it getting a couple more results... but there were already at least a couple before then queued up running or ready to run, so it never has come close to missing a CPU second on my machines. When _initially_ connecting, as you were, it's noticeable. I'm not sure what the problem is, possibly something in the feeder, and someone should probably look into it - but I don't think it's anything critical. |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
|
ralic Send message Joined: 22 Sep 05 Posts: 16 Credit: 46,481 RAC: 0 |
Perhaps related to this thread. Server Status shows: Ready to send 180,708 12/12/2005 16:11:25|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 16:11:25|rosetta@home|Requesting 863 seconds of work, returning 0 results 12/12/2005 16:11:32|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 16:11:32|rosetta@home|No work from project 12/12/2005 16:11:33|rosetta@home|Deferring communication with project for 4 minutes and 1 seconds . [snip more of the same] . 12/12/2005 16:40:12|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/12/2005 16:40:12|rosetta@home|Requesting 10126 seconds of work, returning 0 results 12/12/2005 16:40:15|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 12/12/2005 16:40:15|rosetta@home|No work from project 12/12/2005 16:40:16|rosetta@home|Deferring communication with project for 34 minutes and 32 seconds Seems strange that there are many to send but no work available... [edit] Maybe all those ready to send's are for an application version other than rosetta 4.80 ? [/edit] |
STE\/E Send message Joined: 17 Sep 05 Posts: 125 Credit: 4,101,065 RAC: 144 |
There does seem to be something wrong, I haven't been able to Download any New WU's to any of my PC's all morning long. All I keep getting is the No work from project Message ... ??? |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Yup, started to get the No work in the past hour...... |
Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0 |
There is a tread specific to the possible No work from project issue here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=648 We don't want to hijack this thread with another issue :) |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
There is a problem.......we are getting intermittant 'no work' messages on all machines. The work queue seems to be in good shape.....server load problems caused a similiar symptom about a month ago. The recent load increase must be very dramatic and will only get worse as FAD closes down. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,450 RAC: 13 |
I just got new work... odd... |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
I just got new work... odd... >Bill, new work is getting through ok.....I don't want to create the impression that we are not getting enough to process (at least for us anyway). The point is that the 'no work from project' error messages have been very unusal and are now occuring on a regular basis....Cheers, Rog. |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Just got one through also..... I have one machine that just got one after 3 hours of "No Work" and the other machine still has not got one after 2 hours solid of "No Work".... |
Yeti Send message Joined: 2 Nov 05 Posts: 45 Credit: 14,945,062 RAC: 0 |
I just checked my 15 boxes; most of them show "no work from Project", when they contact Rosetta :-( I hope, that somebody of the projekt-team notices this soon; it would be worst case, if a lot of crunchers come from FAD (and old Seti ?), and then they get no work. Supporting BOINC, a great concept ! |
AnRM Send message Joined: 18 Sep 05 Posts: 123 Credit: 1,355,486 RAC: 0 |
It's definetly getting worse......major data base purge again?? IP probs??..:( The beauty of BOINC will make E@H happy..... |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Copied from the Homepage Technical News - December 12, 2005 Our work unit feeder is having a tough time keeping up with all the client requests for work. A short term fix (as has been done before), is to optimize the database tables. We will be doing this later today at 3pm and also backing up the database. As stated before, we are going to expand our servers soon to deal with this issue. |
John Price Send message Joined: 4 Dec 05 Posts: 4 Credit: 6,142 RAC: 0 |
Follow up: I too am unable to get work from Rosetta. Slightly off the thread a question to you Yeti. It is clear that you are able to do work for Predictor. I have had some 3 months of being unable to get through to the server on Predictor. I can't even get through to www.scripps.edu. Whats the secret? is there a new url or something? As I can't get thru to the website I can't enquire in the predictor message board. |
Message boards :
Number crunching :
Something wrong with Server-Side-Scheduler
©2024 University of Washington
https://www.bakerlab.org