Message boards : Number crunching : Discussion on increasing the default run time
Previous · 1 . . . 7 · 8 · 9 · 10
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The runtime limitations make things more than silly, to my very limited mind. Running Seti@home presented no such problems: if it took longer than expected, it was fine, but such short termism is driving me nuts. I may well join the great exodus if things are not managed better for us humble crunchers. If you could explain what you mean by "runtime limitations" and "short termism", we might be able to help further. The runtime is a flexibility you have available, not a limitation. Is "short termism" are reference to 3 day deadlines? I see your 6 CPU machine got a big pile of work all at once. Is that what you are talking about? You aren't alone in that boat. Corrective steps have been taken to avoid such problems in the future. Looks like the same might be said of your other Windows 10 that is not the "pro" version. If deadlines are missed because your machine was sent too much work, don't let it drive you nuts. The BOINC Manager will take care of it in several ways. One being that it will learn how long your machines are taking to complete the new v4.20 work units, and that let's it better size its requests for more work in the future. It will also abort tasks that pass their deadlines for you. Also the project scheduler has been changed to basically put a cap on how much work is sent back to a machine, which should avoid the overload in the first place. So, I believe the issues are addressed, once your current work runs through its course. Please let me know if I've missed the question. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,179,786 RAC: 10,068 |
The runtime limitations make things more than silly, to my very limited mind. Running Seti@home presented no such problems: if it took longer than expected, it was fine, but such short termism is driving me nuts. I may well join the great exodus if things are not managed better for us humble crunchers. A few things: Your i3's seem to be running fine, but your i5 seems to have something set differently in your Computing preferences. Is something set in the "When to suspend" section? All those fields should ideally be unselected, Have you just upgraded Rosetta? It looks like you're a victim of Boinc's problem of grabbing too many tasks until it works out your runtime properly. It'll unravel itself soon and bring down a more appropriate number of tasks over a few days. Also, Rosetta isn't Seti. The req'ts of Rosetta are very different from what worked for you before, so you should expect to have to adapt sometimes. But let everything settle down over a few more days and reassess then. |
Tom M Send message Joined: 20 Jun 17 Posts: 87 Credit: 15,043,441 RAC: 49,035 |
If I am not confused is the main reason to "increase the default run time" to decrease the download/upload pressure on the servers? I am a newbie on how R@H goes about things so I may not be understanding it right, If the servers are "still" getting hammered to much how about changing the "minimum" to say 8 hours? And enforcing it so nobody is "grandfathered" in. All this presumes reliable check pointing at the client end so part-time crunchers are not impacted. Tom M Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,772,069 RAC: 22,920 |
You do realise this thread is from 2008? Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,179,786 RAC: 10,068 |
If I am not confused is the main reason to "increase the default run time" to decrease the download/upload pressure on the servers? The subject has come up repeatedly since 2008 as the project's grown. It's actually interesting to me that the 2008 request was to increase the 3hr default/1hr minimum to 6hr default/3hr minimum. ie minimum increased to the old default, which is kind of what Tom's suggesting we should do now. Iirc, back then, no changes were made but the default became 4hrs for some reason, then 6hrs, then 8hrs when a surge of new users arrived until the servers got upgraded and killed it as an issue until last month. With COVID19 coinciding with the massive Seti crowd looking for a new home, bringing their inappropriate (for Rosetta) settings with them, servers became overloaded again and we had those adjustments last month: deadlines reduced to 3 days project-wide (and caches downloaded in excess of 3-days automatically aborted), increased runtime to a maximum 36hrs (Rosetta used to allow 48hrs at one time, but the upload files became too large so 48hrs was removed and 24hrs became the max) and finally the 1hr minimum was increased to 2hrs. An attempt to increase the default runtime to 16hrs was withdrawn within a day. And finally the 7 second backoff for updates was increased to 31 seconds to reduce the hit of repeated manual update requests. The result of all that primarily reduced server hits, but also allowed more (all?) of the new users to get tasks to run (and contribute) by reducing user hoarding of tasks and thereby massively improving average turnaround time for the project too. Everyone won, both users and project. There hasn't been anyone complaining about a lack of tasks, no server downtime and tasks getting back to the project quicker and in massively higher volume, all at the same time. Neat trick. While everything seems to be ticking over, I guess there's no current need to change anything, but if servers came under pressure again I kind of agree that minimum runtimes should be next on the list. Changing the minimum from 2hrs to 3hrs would be my choice as, for some reason, it seems to be what I'd call the power-users who insist on the shortest runtime on huge multi-core machines that are in some kind of constant state of DLing, running, rapidly-completing and re-uploading multiple times per hour in some kind of weird race to smooth and maximise their credits at whatever cost to the project. As a massive generalisation, the people posting in the forums complaining the most about some perceived shortcoming were the same people causing the problem they were complaining about in the first place. So constraining those excess from the server side is the best solution in my view. And when pointing that out, it went down least well with them too. Bygones... |
AlanG7Mc Send message Joined: 26 Jun 06 Posts: 4 Credit: 884,099 RAC: 0 |
Increase Work Unit Deadlines, Please. At >least< add a day. Please I'm getting Work Units with of estimated One Day+ CPU -- but the deadline is only 3 days or so later. Other tasks on my computer chew up a lot of time, so Rosetta gets only about 10% of the CPU time (one WU per core). I Hibernate/Sleep my computer at night & when I'm not using it (because of energy costs), and this makes the WU deadlines even tighter. - Result is, WU's risk going over deadline - Rosetta loses my results, and several days of CPU time on my end are wasted. So, please Extend Deadlines by >At Least< one day. Then you get results, and my Cycles are not wasted. Thank You. --- PS: I've been crunching for Rosetta since 2006, for World Community Grid (WCG) since 2009, and for BOINC since 2004. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,772,069 RAC: 22,920 |
Increase Work Unit Deadlines, Please. At >least< add a day. PleaseThe problem isn't the deadline, the problem is the size of your cache and that your system is using the default benchmark values. If you're running only one project, and it spends lots of time off line, then you might need a multi day cache to keep it busy. But if you're running more than one project, then there is no need for a cache at all. Your account, Preferences, Computing Preferences, General Store at least 0.1 days of work Store up to an additional 0.01 days of work You also need to run the benchmarks- that system has only recently been added to Rosetta, and the benchmarks are the default values- since here at Rosetta they are used for Credit & to determine processing time of the Tasks, all of those values are way out. Tools, Run CPU benchmarks. Run the benchmarks, reduce your cache size, and then when Rosetta has some new work again, you won't have any issues returning it in time. Grant Darwin NT |
AlanG7Mc Send message Joined: 26 Jun 06 Posts: 4 Credit: 884,099 RAC: 0 |
{Thanks Grant, Darwin NT :( } TERMINATION NOTICE: With the stupidly 'soon' deadlines that >You Choose< ... With the way that >I Choose< to run my computer (including >My Choosing< hours-per-day to run it) ... I find that the >Very Last< Unit I will every try for You, cannot complete within deadline. With Your Right and privilege to chose too-soon dead-lines ... With >My Right and Privilege< to run my computer My Way, and to >Volunteer< My Way ... Result: YOU have >Chosen< to Steal 30+ hours of CPU time from another project who actually wants results. And you have just thrown away a Volunteer (Me) who has been here for Seventeen Years. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,772,069 RAC: 22,920 |
Excellent externalising there; blaming everyone else for issues that you choose to create, that aren't really an issue. Well done. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,408,957 RAC: 5,473 |
Increase Work Unit Deadlines, Please. At >least< add a day. Please Me, too. I run my (Linux) machine 24//7 and app_config.xml is set to run three Rosetta tasks at a time if they are available. Machine has 16 cores, but Boinc gets only 12. The normal tasks are 8 hours and the Beta 6.00 tasks were 6 hours each. Then I was sent a big batch of Beta 6.00 tasks that were set to 8 hours each, with a deadline of about 2 days. There is now way I can complete them all in time. I have been giving the processor three at a time, but I will not make it. Leave tasks in memory is set. Checkpointing interval is very large. Task switching is set to only about 12 hours. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 391 Credit: 12,078,545 RAC: 4,472 |
Increase Work Unit Deadlines, Please. At >least< add a day. Please So you can get through 18 tasks in 2 days, how big was the batch you received and how big is your cache? If the batch was bigger than 18 the the solution would be to fix the bug this exposes in Boinc rather than to increase the deadline and get even more tasks. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,408,957 RAC: 5,473 |
So you can get through 18 tasks in 2 days, how big was the batch you received and how big is your cache? I normally have 5 projects running, but CPDN has sent no work in a very long time. DENIS has sent no work in the last few days. So neither of those have anything to do these days. What runs are WCG that run 4 at a time. Rosetta that runs three at a time, and Einstein that runs one at a time. I have lately downloaded some MilkyWay and Universe tasks that run up to one each at a time to fill in the gaps. Even so I could run two more task at a time all the time lately. All running tasks get at least 98% of a CPU, mostly over 99%. Here are some recent tasks and how long they took: 1530770705 1362734763 23 Apr 2023, 19:23:10 UTC 24 Apr 2023, 21:13:34 UTC Completed and validated 29,282.59 28,825.76 562.49 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530770708 1362734767 23 Apr 2023, 19:23:10 UTC 25 Apr 2023, 3:36:21 UTC Completed and validated 29,043.76 28,744.24 540.16 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530770709 1362734769 23 Apr 2023, 19:23:10 UTC 25 Apr 2023, 5:16:52 UTC Completed and validated 28,998.19 28,758.61 545.44 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530770624 1362734681 23 Apr 2023, 19:23:10 UTC 25 Apr 2023, 1:12:56 UTC Completed and validated 29,110.59 28,763.46 535.53 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530756556 1362671609 23 Apr 2023, 4:48:27 UTC 24 Apr 2023, 13:05:26 UTC Completed and validated 28,809.55 28,570.11 530.23 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530756558 1362671801 23 Apr 2023, 4:48:27 UTC 24 Apr 2023, 11:28:17 UTC Completed and validated 28,894.38 28,663.98 592.49 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530756559 1362671672 23 Apr 2023, 4:48:27 UTC 24 Apr 2023, 9:00:31 UTC Completed and validated 29,041.22 28,775.46 586.73 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530756560 1362671674 23 Apr 2023, 4:48:27 UTC 24 Apr 2023, 19:32:18 UTC Completed and validated 29,035.93 28,622.18 596.30 Rosetta Beta v6.00 x86_64-pc-linux-gnu 1530756561 1362671764 23 Apr 2023, 4:48:27 UTC 24 Apr 2023, 17:07:46 UTC Completed and validated 29,234.33 28,856.15 618.57 Rosetta Beta v6.00 x86_64-pc-linux-gnu Whenever my machine finishes a Rosetta task, it reports it and then requests more work.Sometimes it requests more tasks even when it has not just finished a task. Usually it gets nothing but sometimes it gets one or two new tasks. My event log has room for 5000 entries, so it does not go back three days. Right now there are 13 in progress, with three actually running. The ones that have not started running have about three days to complete from when they were sent. Store at least 0.50 days of work. Store up to an additional 1.5 days of work. (This is not large enough to coast over the gaps.) Switch tasks every 1187 minutes Request tasks to checkpoint at most every 1801 seconds |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Store up to an additional 1.5 days of work. (This is not large enough to coast over the gaps.)This is the issue, not the deadlines here. Yes, they are low and probably they could add a day or two without any issues on their side, but that's not the main issue you have. With Einstein, Milkyway and Universe as backup projects completely unnecessary to have such large cache, no idea about Universe, but at least Einstein and Milkyway should always have enough tasks. . |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,772,069 RAC: 22,920 |
Store up to an additional 1.5 days of work.There's the problem. WTF would you stop project switching for almost day? And if you want to have a cache, you use the Store at least value. Setting the Additional days value to anything over 0.05 results in odd behaviour (is the best way to put it), 0.01 additional days is best. And the only time you need a cache is if you are running a single project that is unreliable, Since you are running multiple projects there is no need for a cache at all. If you choose excessive values for various settings, then of course you are going to run in to issues with deadlines until the manager can sort things out. And the larger your cache, the longer between switching projects, the more projects you do then the longer it will take for things to settle down & your Resource Share settings to be met. And then of course every time you change something, it has to start all over again. Set no cache (0 days & 0.01 additional days) & 60min for switching between projects and all of the problems you have created will go away. Whenever my machine finishes a Rosetta task, it reports it and then requests more work.Sometimes it requests more tasks even when it has not just finished a task.It's doing that because that's what you are telling it to do with that 1.5 additional days value. At the very least- having additional days set to 0.01 will stop the requesting of extra Rosetta work until the present work is almost done (or your resource share settings have been met & it requests work from another project). (This is not large enough to coast over the gaps.)What gaps? You're running several projects, so there is absolutely no point in having a cache of any sort. If one project doesn't have any work, then other projects will get extra work done. When the project without work gets more work again, then it will get more work done for a while to meet your Resource Share settings. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 188 Credit: 6,408,957 RAC: 5,473 |
(This is not large enough to coast over the gaps.) I have gaps right now even though I am running five main projects. I need DENIS tasks right now but I really need a week-long cache to handle that. And I really am using a total of only 2 days for the cache (0.5 + 1.5). Or I would need over a month of cache to manage the CPDN gaps. And I would have needed around a year to deal with the WCG fiasco. (I just changed that to 0.5+0.1 and will see if that makes any difference.) See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein. I would not run MilkyWay or Universe at all*. Percentages are for Average Work Done. _____ * They give out way too much credit for the work they do. even though they are set t0 0.15% resource share. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 391 Credit: 12,078,545 RAC: 4,472 |
(This is not large enough to coast over the gaps.) You cannot use your cache to cover periods where an individual project does not produce work. Even if you set a cache of a year there would be times when you would run out (WCG, CPDN just as two current examples). The cache is not intended for that purpose, it is to cover short term problems such as internet outages. Set a low cache and let the resource share iron out any project flow problems. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,772,069 RAC: 22,920 |
See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein.And there is the cause of all your "issues"- you are trying to make the programme do something it was never meant to do. The whole reason for BOINC was to allow multiple projects to operate cooperatively. By setting the Resource share values, people are able to allow projects more or less work depending on their preferences. It is not about time. it is not about Credit. It is about work done. The fact that you are trying to have x Tasks for y projects running at any given time is in complete and total contravention of how BOINC was designed to work, and how it actually does work. The Resource Share settings are a ratio, not a percentage, and they are based on work done- Not on processing time, not Credit, not the number of Tasks being run for a particular project, but the work actually done for each project. There is no problem with BOINC, nor any Projects & their deadlines and their work of lack of it- the problem is you & how you are using BOINC. Grant Darwin NT |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I have gaps right now even though I am running five main projects.How's that possible with Einstein and Milkyway in the mix? They always have work available. Of course you must not mess with max_concurrent in the app_config just to meet the exact amount of cores you want for each project, that's not done for that, that's to limit applications that cause issues if running too many WUs for that app at once and needs some overlap to work properly, in particular if running that many projects. . |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,179,786 RAC: 10,068 |
See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein. I would not run MilkyWay or Universe at all*. Percentages are for Average Work Done At times like this, I'm not very friendly. Average Work done. Over what period? At all times? Over a day? A week? A month? A year? "Want" something else. Problem solved. |
Message boards :
Number crunching :
Discussion on increasing the default run time
©2024 University of Washington
https://www.bakerlab.org