Default workunit preferred runtime increases to 16 hours.

Message boards : Number crunching : Default workunit preferred runtime increases to 16 hours.

To post messages, you must log in.

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93082 - Posted: 2 Apr 2020, 15:33:18 UTC
Last modified: 2 Apr 2020, 16:27:34 UTC

There is a Rosetta-specific preference that can only be configured via the website. It is called "Target CPU run time".

Previous values:
Default 8hrs
Minimum 1 hour
Maximum 24 hrs

New values:
Default 16hrs
Minimum 2 hours
Maximum 36 hrs

Work units will now be created with a 3 day delay_bound. This means that tasks will only be issued to machines that estimate they can complete the task in less than 3 days. It also means that if the task is not returned within 3 days, it will be reissued (and BOINC Manager will abort it on your machine).

The Project Team prefers that you leave the runtime preference set to "not selected". This allows them flexibility to make changes in the future.

These longer runtimes will help the project better run the new COVID work units, and support the fantastic support for the user community.

What this means for your machine:
If you've never set a runtime preference, your work units will now target 16 hours of CPU time (which often takes 17 or 18 hours of actual time). Current actual runtime and raw CPU time can been seen in the workunit properties in the BOINC Manager.

Until BOINC Manager gets used to the new runtimes, it will mis-estimate remaining time to run. But it will learn after processing a few. It will also mean that BOINC Manager will request too much work, because it doesn't really understand how long it will take to complete.

This may result in Rosetta "taking over" your machine. Don't panic. If you can, just let things run normally and BOINC Manager will make the necessary adjustments to get back in balance with your resource share with other projects.

If you were running with a 6 day cache of work that only comes from R@h, this will now take a total of 12 days to complete. If this is the case for you, go ahead and abort a few of the tasks. Watch the due dates, many work units now will have 3 day expirations. This tends to mean that if you run R@h in a mix of other projects, that the Rosetta work units may grab "high priority". This is just the BOINC Manager working to complete the work before the due date. It will balance things out once it has a chance to see how these tasks run.

I should point out that my comments here are NOT the formal project announcement about the v4.12 release. I am making some assumptions from user reports and observation of WUs. Please post questions and comments to this thread.
Rosetta Moderator: Mod.Sense
ID: 93082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1633
Credit: 16,774,374
RAC: 13,069
Message 93101 - Posted: 2 Apr 2020, 18:34:59 UTC
Last modified: 2 Apr 2020, 18:41:12 UTC

So more than ever it looks like that if you run more than one BOINC project it's best to set the cache to a very small setting
eg
Store at least             1    days of work
Store up to an additional  0.02 days of work
If running Rosetta only, it probably would be best you set "Store at least xx days of work" no larger than 2.5, ideally 2 or less to allow for any Tasks that do run longer than they should. No point losing any Credit for all that CPU time when the project cancels a Task on your system because it took too long to return.



NB- so far looking at my Task list (and the Tasks by application statuses on the Server Status page), it appears only Rosetta Mini has the longer default Runtimes.
Grant
Darwin NT
ID: 93101 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93121 - Posted: 2 Apr 2020, 23:05:17 UTC

OK, so what I described was the case for somewhere between 12 to 24 hours. Since that time, they have reverted back to the normal 8 hour default runtimes.
Rosetta Moderator: Mod.Sense
ID: 93121 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2074
Credit: 40,613,760
RAC: 5,140
Message 93124 - Posted: 2 Apr 2020, 23:25:49 UTC - in response to Message 93082.  

There is a Rosetta-specific preference that can only be configured via the website. It is called "Target CPU run time".

Previous values:
Default 8hrs
Minimum 1 hour
Maximum 24 hrs

New values:
Default 16hrs
Minimum 2 hours
Maximum 36 hrs

Work units will now be created with a 3 day delay_bound. This means that tasks will only be issued to machines that estimate they can complete the task in less than 3 days. It also means that if the task is not returned within 3 days, it will be reissued (and BOINC Manager will abort it on your machine).

Blimey! Longer min, default and max runtimes, but at the same time smaller buffers and quicker turnaround times. And shorter deadlines? Neat trick
That delay_bound thing is a doozy

Until BOINC Manager gets used to the new runtimes, it will mis-estimate remaining time to run. But it will learn after processing a few. It will also mean that BOINC Manager will request too much work, because it doesn't really understand how long it will take to complete.

This may result in Rosetta "taking over" your machine. Don't panic. If you can, just let things run normally and BOINC Manager will make the necessary adjustments to get back in balance with your resource share with other projects.

If you were running with a 6 day cache of work that only comes from R@h, this will now take a total of 12 days to complete. If this is the case for you, go ahead and abort a few of the tasks. Watch the due dates, many work units now will have 3 day expirations. This tends to mean that if you run R@h in a mix of other projects, that the Rosetta work units may grab "high priority". This is just the BOINC Manager working to complete the work before the due date. It will balance things out once it has a chance to see how these tasks run.

Important advice. Weird things will happen for a few days, but it'll sort itself out.
Permission to abort excess tasks that now won't meet deadline
ID: 93124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93125 - Posted: 2 Apr 2020, 23:30:45 UTC
Last modified: 2 Apr 2020, 23:31:39 UTC

I saw you coming :)
If this is the case for you, go ahead and abort a few of the tasks.


But now they've set the default back, so I'd guess if you were using the default and now did an update to project, it would reset back to 8 hours. This could be bad if you have tasks that have already run longer than 8 hours, because watchdog is trained to watch for such things.
Rosetta Moderator: Mod.Sense
ID: 93125 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sabrina Tarson

Send message
Joined: 27 Jun 12
Posts: 20
Credit: 3,397,078
RAC: 0
Message 93127 - Posted: 2 Apr 2020, 23:38:21 UTC

So I'm confused, should we just disregard the first post in this thread, seeing as it's been rolled back to 8 hours?
ID: 93127 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2074
Credit: 40,613,760
RAC: 5,140
Message 93129 - Posted: 2 Apr 2020, 23:48:55 UTC - in response to Message 93121.  

OK, so what I described was the case for somewhere between 12 to 24 hours. Since that time, they have reverted back to the normal 8 hour default runtimes.

Oh, good. That was the only thing that looked out of kilter to me.

16hrs would've been very ugly when it comes to credits. 1 lot of credit one day, 2 lots of credits the next - very up and down.
I was toying with the idea of switching to 12hr tasks the other day, but shied away from it.
8hrs is nominally 3 lots of credits per day. 12hrs is 2 lots. With 10hrs (slightly longer by wall clock and/or allowing for watchdog interventions) being 2 lots of credit too, occasionally more.

If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either)

The one I like most is increasing the minimum runtime to 2hrs. I've had a bee in my bonnet about that for years. Drives me crackers when I see people using it.
The hit on the servers must be huge - they've already said it's struggling with the masses of new users. Now some relief at no cost to anyone.
ID: 93129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93130 - Posted: 2 Apr 2020, 23:50:25 UTC - in response to Message 93127.  

For now, I'm leaving it here, because it did effect people throughout the day today. But, going forward, the runtime default will be 8 hours.
Rosetta Moderator: Mod.Sense
ID: 93130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2074
Credit: 40,613,760
RAC: 5,140
Message 93131 - Posted: 2 Apr 2020, 23:54:54 UTC - in response to Message 93125.  

I saw you coming :)
If this is the case for you, go ahead and abort a few of the tasks.

But now they've set the default back, so I'd guess if you were using the default and now did an update to project, it would reset back to 8 hours. This could be bad if you have tasks that have already run longer than 8 hours, because watchdog is trained to watch for such things.

If the watchdog is set only at the start of the task it'll only affect those running tasks, won't it? A blip.
ID: 93131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2074
Credit: 40,613,760
RAC: 5,140
Message 93132 - Posted: 3 Apr 2020, 0:00:42 UTC - in response to Message 93130.  

For now, I'm leaving it here, because it did effect people throughout the day today. But, going forward, the runtime default will be 8 hours.

The min runtime will affect a lot of people still. Max runtime very few if any.
But that delay_bound thing will have a huge effect on some people, doubly so for people who used to have 1hr runtimes and I assume move to 2hrs (and the sh*t will hit the fan if they move straight to 8hrs)
ID: 93132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93133 - Posted: 3 Apr 2020, 0:02:35 UTC - in response to Message 93129.  
Last modified: 3 Apr 2020, 0:03:00 UTC

If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either)


You get too long of a target, and a machine that is only active 8 hours a day has trouble completing it in a 3 day deadline.

I've actually thought it would be great for the Project Team and the servers, the bandwidth, the users... if they started using the intermediate results type of work units. You may have seen climate prediction do that, where they had models that took several weeks to run. But every day or two, each WU would cut an increment and send it in. I suppose it would change the processes they use as they go to interrogate the results though. Perhaps the assimilator or other could be used to mask the difference.

Do you remember the days where 4 days as the max runtime preference? I guess the required speed of results is very different when you are doing pure research (in those days) and when you are doing applied science.
Rosetta Moderator: Mod.Sense
ID: 93133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2074
Credit: 40,613,760
RAC: 5,140
Message 93141 - Posted: 3 Apr 2020, 0:54:02 UTC - in response to Message 93133.  

If it gets changed in future, I think 12hrs is a good compromise. Definitely not 16. May as well go to 24 if 16 is considered (not recommending either)

You get too long of a target, and a machine that is only active 8 hours a day has trouble completing it in a 3 day deadline.

That's true. My work PC, now on lockdown and inaccessible, ran 12hr/day off 12hrs and suffered this sometimes. Especially from Saturday to Monday.
I think Rosetta holds a %age for uptime and accounts for this to some extent

I've actually thought it would be great for the Project Team and the servers, the bandwidth, the users... if they started using the intermediate results type of work units. You may have seen climate prediction do that, where they had models that took several weeks to run. But every day or two, each WU would cut an increment and send it in. I suppose it would change the processes they use as they go to interrogate the results though. Perhaps the assimilator or other could be used to mask the difference.

I have very limited experience (aka none) of other projects tbh. Sounds interesting though. The scale of turnaround time is the opposite of here though.

Do you remember the days where 4 days as the max runtime preference? I guess the required speed of results is very different when you are doing pure research (in those days) and when you are doing applied science.

4, yes. Then 6. Then, when Charity Engine arrived unannounced and people couldn't comprehend the number of new accounts being auto-created, even wondering whether we were under a DDOS attack, with servers buckling under the weight of it all, 8hrs was implemented unannounced until all the servers got updated.
And now here we are, nearly struggling again, but I think these changes are aimed at pre-empting bigger issues reappearing. I think they're headed in the right direction.
ID: 93141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,308,275
RAC: 16,303
Message 93149 - Posted: 3 Apr 2020, 2:43:02 UTC - in response to Message 93133.  


You get too long of a target, and a machine that is only active 8 hours a day has trouble completing it in a 3 day deadline.

I've actually thought it would be great for the Project Team and the servers, the bandwidth, the users... if they started using the intermediate results type of work units. You may have seen climate prediction do that, where they had models that took several weeks to run. But every day or two, each WU would cut an increment and send it in. I suppose it would change the processes they use as they go to interrogate the results though. Perhaps the assimilator or other could be used to mask the difference.

It is a very good idea. And it actually relatively easy to implement because almost all R@H WU generate a lot (from few to few hundreds) models or "decoys" per each WU.
And such "decoys" not intermediate results , it complete final results(from client side point of view) which can be used independently from the rest results from same WU (even if WU fails later - already generate decoys are useful). Or all of them (even after WU is fully completed) can be regarded as intermediate results.

Because scientists, for each studied target, need to get from several thousand to hundreds of thousands of such "decoys", but it does not matter how they were broken into WUs or in what order and how they arrived to the server. Just the more you get collected, the better and more reliable the final result will be, which is any case obtained only after post-processing on the server, and not on the client computers of the volunteers.
ID: 93149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 93151 - Posted: 3 Apr 2020, 3:20:12 UTC

Strange things happen. I run on my machine 6 WU at the same time (its too hot here to run more). 24hrs runtime, to reduce the overhead for loading+unloading of the WU and the internet traffic. Seems the deadline is now 3 days.
To get all WU finished on time should i get 12 WU max. But i got 64! What a waste of internet traffic/load at your servers and what a waste of WU if they would stuck in my machine.
ID: 93151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Default workunit preferred runtime increases to 16 hours.



©2024 University of Washington
https://www.bakerlab.org