Message boards : Number crunching : Why no R@h work downloading? I also run SETI
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
read this wiki thread on work buffer size. from near the bottom comes this: Recommendation for Multiple Project Systems If you are running several BOINC Powered Projects on a computer we suggest a maximum value of 2 to 4 days divided by the number of Attached Projects if you use a dial-up connection. If you have an always on Internet connection we suggest a setting of 0.1 days (2.4 hours). there is a chart above this. there is also the issue of debt to seti in addition to buffer and connec times. thats a whole different topic. John McCleod VII wrote some time ago on the BOINC fora: |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Since I/we are not alone, I would suggest that RAH put this problem on the News page and state a resolution will be available within ___ days and have a hyperlink to the solution once it is available. Short answer, since it is not a bug, no bug-fix will be made. Longer answer... With the re-design of work-request and cpu-scheduling for v5.8.xx and later v5.10.xx, some changes was made to meet the following requirements: A; If at all possible, return all work by the deadline. B; Keep all cpu's busy. C; Keep cache filled with enough work. Requirement A is the most important one, both from most BOINC-projects point of view, but also for the user, since in most BOINC-projects returning after the deadline means no credit and result not used for anything. To meet the A-requirement, some of the scheduling-rules being used is: 1; If "connect every N days" > "deadline", don't assign wu at all. 2; If "expected run-time" > "deadline", don't assign wu at all. 3; If "connect every N days" > 1/2 * "time to deadline", block work-request to project. 4; If "expected run-time" > 0.9 * "time to deadline", block work-request to project. 5; For 3 & 4, add an additional "switch between project every N minutes" safety-margin. For most users, this means 1 hour. If multi-project, one project being blocked normally doesn't mean other projects can't continue downloading work, but it is a possibility in some instances. Blocking work-request due to #3 or #4 can be due to a single wu, or due to the sum of many wu's. Rule #1 & #2 is checked and enforced by the Scheduling-server, and is only overridden if computer doesn't have a single wu for the particular project, meaning 1 but only 1 wu is assigned. Rule #3, #4 & #5 is checked and enforced by the Client, and is only overridden if idle cpu. Why the rules? #1 should be self-evident, if client says "I won't connect again before in 2 days", it's no hope of a wu with 1-day deadline can be returned by deadline. #2 should also be fairly self-explanatory, if client reports "runs 50% of the time" and a wu needs 1.5 days cpu-time and has a 2-days deadline, client again has no hope of returning by deadline, since client will use 3 days run-time to get 1.5 days cpu-time. #3 on the other hand is not immediately apparent for everyone. But, let's say a wu has 13 hours left to run, 23 hours to deadline, and "connect every 12 hours". Meaning, based on run-time and time to deadline, there's no problems. But, "connect every 12 hours" means client will have next connection in 12 hours, and by this time 12 of the 13 hours crunch-time is used. The 2nd. connection on the other hand will also be 12 hours away, and even now only 1 hour left to crunch, client will not connect again until after the deadline... Also remember, it can be due to a sum of wu's. If example has 10 wu's each taking 6 hours to crunch and 80 hours until deadline, with a "connect every 48 hours" you'll still not connect often enough to manage reporting all wu's before deadline. #4, since most wu's isn't using exactly the run-time as expected, and usage-patterns can variate, a safety-margin of 10% was added, so wu's can take 10% longer time than expected but still meet the deadline. #5 is mostly for internal client-scheduling-reasons, but normally has little influence on anything. So, that is the effects of these rules for someone only running Rosetta@home, knowing Rosetta@home has a 10-days deadline? #1 and #2 has little effect, even with a 10-days cache. Possibly you'll get 1 wu less than a full 10-days, but little effect. #3 on the other hand, the moment "connect every N days" > 4.71 days, you will only ask for work then idle cpu. You'll be assigned many wu's at once, but will not re-ask before idle cpu again. With v5.10.xx, anyone having a permanent connection is recommended to use zero here, or possibly 0.01 days or something, and instead use "additional days"-setting for cache-size. #4 will give the most "interesting" pattern. With a 10-days cache-setting, you'll on day-0 fill-up with upto 10 days. After crunching 1 day, your cached work has dropped to only 9 days, less than cache-setting, so why not re-fill? Answer, your 9 days of cached work has 9 days until deadline, meaning you don't have the 10% "safety-margin". This will continue, after 2 days crunching you've got 8 days cached with 8 days until deadline, and this will continue till hits idle cpu... So, if "cache additional days" + "connect every N days" > 8.9 days, you'll fill-up to a full cache, but only re-fill then empty or idle cpu... For anyone running v5.10.14 or later, if they takes a look on Task-tab and see one of the running tasks marked "High Priority" it means "this project is currently blocked from asking for more work". If multi-project, one project being blocked doesn't normally mean other projects can't still give you more work, keeping your cache full. But, long-term-debt, resource-share and so on and deadlines in other projects can still influence things, and be the reason for no work-request. But in any case, to fix "High Priority" in Rosetta@home: Decrease "connect every N days" to less than 4.7 days. If permanently connected, use 0.1 days or less. Decrease "cache additional days" until "additonal days" + "connect every N days" is less than 8.9 days. Now, if you're currently breaking rule #4, changing your preferences won't have any immediate effects. But, hopefully, then cache finally hits empty, then you re-fill again you won't hit the limits a 2nd. time. And, just to be on the safe side, my recommendation would be max 4 days for "connect every N days", but only if you really needs it due to infrequently connected, and max total cache 8 days. Or, run multiple projects. :) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
BorgVampire Send message Joined: 5 Jan 06 Posts: 3 Credit: 584,439 RAC: 0 |
If boinc is not downloading work units for rosetta because you have a lot of work units already for another project then you can force rosetta to download more work units by suspending the other project for a few seconds. Rosetta will then download more work units but your computer may not be able to complete both projects work load on time. To solve that fact you could then leave your computer on for longer periods, however if you leave your computer on 24/7 that will obviously be impossible and you may have unfinished work units that go beyond their deadlines for completion. |
Keck_Komputers Send message Joined: 17 Sep 05 Posts: 211 Credit: 4,246,150 RAC: 0 |
While BorgVampire's suggestion will work it is not really a good practice. When you force BOINC to download tasks like that it makes it take that much longer to settle into a more predictable rhythm. Ingleside gave a very good explanation. One thing to remember is that when you change your queue size it can take up to double the longer setting for things to settle down. BOINC WIKI BOINCing since 2002/12/8 |
BorgVampire Send message Joined: 5 Jan 06 Posts: 3 Credit: 584,439 RAC: 0 |
What I would love to see is Boinc deliberately slowing faster projects down to equalize the work done by all projects to a point that all projects get an equal work load finished. Also if a project goes down for a period, when it comes back up I would like boinc to slow the other project down to allow it to catch up. That way both projects would stay at equal footing throughout the years. Obviously there would need to be a setting for which projects you would like this to happen with in your preferences but I think it would be fairly easy to program that in just useing the results tables. At the moment I try to do this by setting Rosetta at 53% and Seti at 47, but they are never equal for long. |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
It's the same old thing. Users want and expect the projects to crunch according to resource share. If I have 3 projects with equal shares, I want each one to crunch 8 hours a day. With task switching set to 1 hour, it should dang-well stop the one it's doing every hour and switch to the next project. Simple, easy to understand - none of this complicated horse-pucky. If a deadline don't get met, so what? There's plenty of other users wanting to download work, and it'll git done. Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
Aux10 Send message Joined: 16 Aug 08 Posts: 1 Credit: 137,736 RAC: 0 |
I just added R@H through BAM account manager. I have one computer running SETI, Prime Grid, and Einstein and another just running Prime Grid. I just added a new computer to run R@H. The project was successfully attached to the computer through BAM but it is not receiving any work. The other computers are receiving and running all tasks just fine. Why not the new one? |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
I just added R@H through BAM account manager. I have one computer running SETI, Prime Grid, and Einstein and another just running Prime Grid. I just added a new computer to run R@H. The project was successfully attached to the computer through BAM but it is not receiving any work. The other computers are receiving and running all tasks just fine. Why not the new one? Hello Aux10, Welcome to Rosetta@home. Just as many other R@h crunchers you can't download workunits because the server doesn't produce any Wu's; watch the homepage, right hand up corner: Server Status as of 16 Aug 2008 20:45:33 UTC [ Scheduler running ] Queued: 0 Don't know why there are no Wu's produced, hopefully the techs will find out soon. Have a nice day, Path7. |
DaBrat and DaBear Send message Joined: 9 Aug 08 Posts: 16 Credit: 213,180 RAC: 0 |
Learn something new everyday. I was actually browsing looking for the server issues and instead found out why I got that big 'no work' message from the server after one of my comps crunched nothing but Rosie for a few days during the SETI outage... thanks so much. |
RamonS Send message Joined: 19 Jun 08 Posts: 3 Credit: 13,195,479 RAC: 0 |
Hi! Seems that the queue is still not filling up as of 8:37PM EDT on August 17 2008. I also joined other projects and some, too, have nothing in the queue (Lattice) while others work fine. Makes me believe that it is not a client issue. Happy to have my little servers run for this cause, nobody wants to look at my website anyway. ;) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Ramon, all the posts from August 16-18 time frame were caused by a server problem. Others have been referred to this thread over time to explain more about how the BOINC client works in scheduling things. So, the problems over the weekend were not caused at the client. But, at other times, they can be. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Mod - the page showing the server status showed all green, yet the homepage showed 0 queued and the server page showed something like only 1000 or 10000 ready to send. if it was a server problem, then why the all green? seems that at times the status page does not reflect issues that are seen by us trying to download work. was the server in question that had the trouble not part of the list shown in the status page? Ramon, all the posts from August 16-18 time frame were caused by a server problem. Others have been referred to this thread over time to explain more about how the BOINC client works in scheduling things. So, the problems over the weekend were not caused at the client. But, at other times, they can be. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Greg, I am not in a position to know about the details behind the scenes to address your specific question. I know there have been problems in the past where the servers were all active, but the file server behind them was having congestion and performance issues. And in such cases new work was being produced, but not fast enough to keep up with demand. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ok thanks for that is there a 'duty watch' person as such for weekends? being this last problem happened on a weekend maybe there should be such a person that watches the system or the boards for such problems? Greg, I am not in a position to know about the details behind the scenes to address your specific question. I know there have been problems in the past where the servers were all active, but the file server behind them was having congestion and performance issues. And in such cases new work was being produced, but not fast enough to keep up with demand. |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
Hello everybody, I guess my problem has something to do with the server problems discussed here since it appeared at the weekend after upgrading boinc so i'll post it here: I still get no work from the project. 2 other machines with the same internet connection run fine now again, but my quad core is just getting work from WCG, not from Rosetta or R@lph (It did get work before). The Machine is just fetching the scheduler list (whatever this is) for Rosetta and R@lph over and over again with no result or error Message. I lowered the share for WCG, i've even detached it and downgraded to boinc 5 with no result. Any ideas or suggestions ? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
WBT, please see the description in the original post of this thread. Is your host requesting new work from Rosetta? It would be helpful if you could copy a few of the messages here. Rosetta Moderator: Mod.Sense |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
that's it. and this happens over and over again after the 24:00 hours (or earlier if i click the button). i guess it may be a boinc or connection problem but why is Wcg running without problems and other machines too ? I'll do a clean Boinc install tomorrow and find out. Hope my description isn't too confusing. 20.08.2008 16:37:50||Starting BOINC client version 5.10.45 for windows_x86_64 20.08.2008 16:37:50||log flags: task, file_xfer, sched_ops 20.08.2008 16:37:52||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 20.08.2008 16:37:52||Data directory: C:Program FilesBOINC 20.08.2008 16:37:53||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11] 20.08.2008 16:37:53||Processor features: fpu tsc pae nx sse sse2 pni 20.08.2008 16:37:53||OS: Microsoft Windows Vista: , Service Pack 1, (06.00.6001.00) 20.08.2008 16:37:53||Memory: 4.00 GB physical, 8.17 GB virtual 20.08.2008 16:37:53||Disk: 111.79 GB total, 71.31 GB free 20.08.2008 16:37:54||Local time is UTC +2 hours 20.08.2008 16:37:55|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: not assigned yet; location: (none); project prefs: default 20.08.2008 16:37:55|ralph@home|URL: http://ralph.bakerlab.org/; Computer ID: not assigned yet; location: (none); project prefs: default 20.08.2008 16:37:56||No general preferences found - using BOINC defaults 20.08.2008 16:37:56||Reading preferences override file 20.08.2008 16:37:56||Preferences limit memory usage when active to 2046.79MB 20.08.2008 16:37:56||Preferences limit memory usage when idle to 3684.22MB 20.08.2008 16:37:56||Preferences limit disk usage to 9.31GB 20.08.2008 16:39:47|rosetta@home|Fetching scheduler list 20.08.2008 16:42:26||Fetching configuration file from http://boincstats.com/bam/get_project_config.php 20.08.2008 16:47:09||Fetching configuration file from http://bam.boincstats.com/get_project_config.php 20.08.2008 16:47:50||Contacting account manager at http://bam.boincstats.com/ 20.08.2008 16:49:42|rosetta@home|Resetting project 20.08.2008 16:49:42|rosetta@home|Detaching from project 20.08.2008 16:49:46|ralph@home|Resetting project 20.08.2008 16:49:46|ralph@home|Detaching from project 20.08.2008 16:50:17||Account manager: BAM Host-ID: 118955 20.08.2008 16:50:17||Account manager contact succeeded 20.08.2008 16:50:17||Attaching to https://boinc.bakerlab.org/rosetta/ 20.08.2008 16:50:17||Attaching to http://www.worldcommunitygrid.org/ 20.08.2008 16:50:17||General prefs: from http://bam.boincstats.com/ (last modified 20-Aug-2008 16:50:41) 20.08.2008 16:50:17||Host location: none 20.08.2008 16:50:17||General prefs: using your defaults 20.08.2008 16:50:17||Reading preferences override file 20.08.2008 16:50:17||Preferences limit memory usage when active to 2046.79MB 20.08.2008 16:50:17||Preferences limit memory usage when idle to 3684.22MB 20.08.2008 16:50:17||Preferences limit disk usage to 9.31GB 20.08.2008 16:50:42|http://www.worldcommunitygrid.org/|Master file download succeeded 20.08.2008 16:50:47|http://www.worldcommunitygrid.org/|Sending scheduler request: Project initialization. Requesting 1 seconds of work, reporting 0 completed tasks 20.08.2008 16:50:57|World Community Grid|Scheduler request succeeded: got 1 new tasks 20.08.2008 16:50:57||General prefs: from World Community Grid (last modified 17-Aug-2008 11:27:09) 20.08.2008 16:50:57||Host location: none 20.08.2008 16:50:57||General prefs: using your defaults 20.08.2008 16:50:57||Reading preferences override file 20.08.2008 16:50:57||Preferences limit memory usage when active to 2046.79MB 20.08.2008 16:50:57||Preferences limit memory usage when idle to 3684.22MB 20.08.2008 16:50:57||Preferences limit disk usage to 9.31GB 20.08.2008 16:50:59|World Community Grid|Started download of wcg_hcc1_img_6.06_windows_intelx86 ...later: 20.08.2008 20:07:34|https://boinc.bakerlab.org/rosetta/|Fetching scheduler list and at the same time on the 'project page' of boinc: Waiting scheduler request, Project initialization, Communication dalayed for 24:00 h (translated this line into english so dont take every word of the error message too serious) |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
there was no edit button so excuse my double posting please. just for your interest: reinstalling boinc didn't work. I'll keep rosetta in my project list and crunch for wcg with this host. maybe it is vista that doesn't like rosetta or the vista firewall is blocking rosetta only on this host (shared connection). whatever it is, I don't like it. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,496 |
there was no edit button so excuse my double posting please. I'm running rosetta under BOINC 5.10.45 also, under Vista SP1, but a 32-bit version. After a recent problem with rosetta not generating workunits as fast as they were downloaded, I haven't seen any such problems. |
WBT112 Send message Joined: 11 Dec 05 Posts: 11 Credit: 1,382,693 RAC: 0 |
Just if anyone is interested what happened in the meantime: I crunched for WCG for about a week on this machine and just 5 minutes ago the master file download finally succeded (i wasn't even here)! I still don't have any idea why but i'll donate some computing power of this computer to rosetta again. bye :) |
Message boards :
Number crunching :
Why no R@h work downloading? I also run SETI
©2024 University of Washington
https://www.bakerlab.org