Message boards : Number crunching : Default workunit preferred runtime increases to 16 hours.
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There is a Rosetta-specific preference that can only be configured via the website. It is called "Target CPU run time". Previous values: Default 8hrs Minimum 1 hour Maximum 24 hrs New values: Default 16hrs Minimum 2 hours Maximum 36 hrs Work units will now be created with a 3 day delay_bound. This means that tasks will only be issued to machines that estimate they can complete the task in less than 3 days. It also means that if the task is not returned within 3 days, it will be reissued (and BOINC Manager will abort it on your machine). The Project Team prefers that you leave the runtime preference set to "not selected". This allows them flexibility to make changes in the future. These longer runtimes will help the project better run the new COVID work units, and support the fantastic support for the user community. What this means for your machine: If you've never set a runtime preference, your work units will now target 16 hours of CPU time (which often takes 17 or 18 hours of actual time). Current actual runtime and raw CPU time can been seen in the workunit properties in the BOINC Manager. Until BOINC Manager gets used to the new runtimes, it will mis-estimate remaining time to run. But it will learn after processing a few. It will also mean that BOINC Manager will request too much work, because it doesn't really understand how long it will take to complete. This may result in Rosetta "taking over" your machine. Don't panic. If you can, just let things run normally and BOINC Manager will make the necessary adjustments to get back in balance with your resource share with other projects. If you were running with a 6 day cache of work that only comes from R@h, this will now take a total of 12 days to complete. If this is the case for you, go ahead and abort a few of the tasks. Watch the due dates, many work units now will have 3 day expirations. This tends to mean that if you run R@h in a mix of other projects, that the Rosetta work units may grab "high priority". This is just the BOINC Manager working to complete the work before the due date. It will balance things out once it has a chance to see how these tasks run. I should point out that my comments here are NOT the formal project announcement about the v4.12 release. I am making some assumptions from user reports and observation of WUs. Please post questions and comments to this thread. Rosetta Moderator: Mod.Sense |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1680 Credit: 17,838,279 RAC: 22,988 |
So more than ever it looks like that if you run more than one BOINC project it's best to set the cache to a very small setting eg Store at least 1 days of work Store up to an additional 0.02 days of workIf running Rosetta only, it probably would be best you set "Store at least xx days of work" no larger than 2.5, ideally 2 or less to allow for any Tasks that do run longer than they should. No point losing any Credit for all that CPU time when the project cancels a Task on your system because it took too long to return. NB- so far looking at my Task list (and the Tasks by application statuses on the Server Status page), it appears only Rosetta Mini has the longer default Runtimes. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
OK, so what I described was the case for somewhere between 12 to 24 hours. Since that time, they have reverted back to the normal 8 hour default runtimes. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,217,838 RAC: 10,815 |
There is a Rosetta-specific preference that can only be configured via the website. It is called "Target CPU run time". Blimey! Longer min, default and max runtimes, but at the same time smaller buffers and quicker turnaround times. And shorter deadlines? Neat trick That delay_bound thing is a doozy Until BOINC Manager gets used to the new runtimes, it will mis-estimate remaining time to run. But it will learn after processing a few. It will also mean that BOINC Manager will request too much work, because it doesn't really understand how long it will take to complete. Important advice. Weird things will happen for a few days, but it'll sort itself out. Permission to abort excess tasks that now won't meet deadline |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I saw you coming :) If this is the case for you, go ahead and abort a few of the tasks. But now they've set the default back, so I'd guess if you were using the default and now did an update to project, it would reset back to 8 hours. This could be bad if you have tasks that have already run longer than 8 hours, because watchdog is trained to watch for such things. Rosetta Moderator: Mod.Sense |
Sabrina Tarson Send message Joined: 27 Jun 12 Posts: 20 Credit: 3,397,078 RAC: 0 |
So I'm confused, should we just disregard the first post in this thread, seeing as it's been rolled back to 8 hours? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,217,838 RAC: 10,815 |
OK, so what I described was the case for somewhere between 12 to 24 hours. Since that time, they have reverted back to the normal 8 hour default runtimes. Oh, good. That was the only thing that looked out of kilter to me. 16hrs would've been very ugly when it comes to credits. 1 lot of credit one day, 2 lots of credits the next - very up and down. I was toying with the idea of switching to 12hr tasks the other day, but shied away from it. 8hrs is nominally 3 lots of credits per day. 12hrs is 2 lots. With 10hrs (slightly longer by wall clock and/or allowing for watchdog interventions) being 2 lots of credit too, occasionally more. If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either) The one I like most is increasing the minimum runtime to 2hrs. I've had a bee in my bonnet about that for years. Drives me crackers when I see people using it. The hit on the servers must be huge - they've already said it's struggling with the masses of new users. Now some relief at no cost to anyone. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
For now, I'm leaving it here, because it did effect people throughout the day today. But, going forward, the runtime default will be 8 hours. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,217,838 RAC: 10,815 |
I saw you coming :) If the watchdog is set only at the start of the task it'll only affect those running tasks, won't it? A blip. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,217,838 RAC: 10,815 |
For now, I'm leaving it here, because it did effect people throughout the day today. But, going forward, the runtime default will be 8 hours. The min runtime will affect a lot of people still. Max runtime very few if any. But that delay_bound thing will have a huge effect on some people, doubly so for people who used to have 1hr runtimes and I assume move to 2hrs (and the sh*t will hit the fan if they move straight to 8hrs) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either) You get too long of a target, and a machine that is only active 8 hours a day has trouble completing it in a 3 day deadline. I've actually thought it would be great for the Project Team and the servers, the bandwidth, the users... if they started using the intermediate results type of work units. You may have seen climate prediction do that, where they had models that took several weeks to run. But every day or two, each WU would cut an increment and send it in. I suppose it would change the processes they use as they go to interrogate the results though. Perhaps the assimilator or other could be used to mask the difference. Do you remember the days where 4 days as the max runtime preference? I guess the required speed of results is very different when you are doing pure research (in those days) and when you are doing applied science. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,217,838 RAC: 10,815 |
If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either) That's true. My work PC, now on lockdown and inaccessible, ran 12hr/day off 12hrs and suffered this sometimes. Especially from Saturday to Monday. I think Rosetta holds a %age for uptime and accounts for this to some extent I've actually thought it would be great for the Project Team and the servers, the bandwidth, the users... if they started using the intermediate results type of work units. You may have seen climate prediction do that, where they had models that took several weeks to run. But every day or two, each WU would cut an increment and send it in. I suppose it would change the processes they use as they go to interrogate the results though. Perhaps the assimilator or other could be used to mask the difference. I have very limited experience (aka none) of other projects tbh. Sounds interesting though. The scale of turnaround time is the opposite of here though. Do you remember the days where 4 days as the max runtime preference? I guess the required speed of results is very different when you are doing pure research (in those days) and when you are doing applied science. 4, yes. Then 6. Then, when Charity Engine arrived unannounced and people couldn't comprehend the number of new accounts being auto-created, even wondering whether we were under a DDOS attack, with servers buckling under the weight of it all, 8hrs was implemented unannounced until all the servers got updated. And now here we are, nearly struggling again, but I think these changes are aimed at pre-empting bigger issues reappearing. I think they're headed in the right direction. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,975,602 RAC: 14,620 |
It is a very good idea. And it actually relatively easy to implement because almost all R@H WU generate a lot (from few to few hundreds) models or "decoys" per each WU. And such "decoys" not intermediate results , it complete final results(from client side point of view) which can be used independently from the rest results from same WU (even if WU fails later - already generate decoys are useful). Or all of them (even after WU is fully completed) can be regarded as intermediate results. Because scientists, for each studied target, need to get from several thousand to hundreds of thousands of such "decoys", but it does not matter how they were broken into WUs or in what order and how they arrived to the server. Just the more you get collected, the better and more reliable the final result will be, which is any case obtained only after post-processing on the server, and not on the client computers of the volunteers. |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
Strange things happen. I run on my machine 6 WU at the same time (its too hot here to run more). 24hrs runtime, to reduce the overhead for loading+unloading of the WU and the internet traffic. Seems the deadline is now 3 days. To get all WU finished on time should i get 12 WU max. But i got 64! What a waste of internet traffic/load at your servers and what a waste of WU if they would stuck in my machine. |
Message boards :
Number crunching :
Default workunit preferred runtime increases to 16 hours.
©2024 University of Washington
https://www.bakerlab.org