Discussion on increasing the default run time

Message boards : Number crunching : Discussion on increasing the default run time

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96141 - Posted: 5 May 2020, 22:39:45 UTC - in response to Message 96135.  

The runtime limitations make things more than silly, to my very limited mind. Running Seti@home presented no such problems: if it took longer than expected, it was fine, but such short termism is driving me nuts. I may well join the great exodus if things are not managed better for us humble crunchers.


If you could explain what you mean by "runtime limitations" and "short termism", we might be able to help further. The runtime is a flexibility you have available, not a limitation. Is "short termism" are reference to 3 day deadlines?

I see your 6 CPU machine got a big pile of work all at once. Is that what you are talking about? You aren't alone in that boat. Corrective steps have been taken to avoid such problems in the future. Looks like the same might be said of your other Windows 10 that is not the "pro" version.

If deadlines are missed because your machine was sent too much work, don't let it drive you nuts. The BOINC Manager will take care of it in several ways. One being that it will learn how long your machines are taking to complete the new v4.20 work units, and that let's it better size its requests for more work in the future. It will also abort tasks that pass their deadlines for you. Also the project scheduler has been changed to basically put a cap on how much work is sent back to a machine, which should avoid the overload in the first place.

So, I believe the issues are addressed, once your current work runs through its course. Please let me know if I've missed the question.
Rosetta Moderator: Mod.Sense
ID: 96141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,509,582
RAC: 14,752
Message 96153 - Posted: 6 May 2020, 7:27:52 UTC - in response to Message 96135.  

The runtime limitations make things more than silly, to my very limited mind. Running Seti@home presented no such problems: if it took longer than expected, it was fine, but such short termism is driving me nuts. I may well join the great exodus if things are not managed better for us humble crunchers.

A few things:
Your i3's seem to be running fine, but your i5 seems to have something set differently in your Computing preferences. Is something set in the "When to suspend" section? All those fields should ideally be unselected,
Have you just upgraded Rosetta? It looks like you're a victim of Boinc's problem of grabbing too many tasks until it works out your runtime properly. It'll unravel itself soon and bring down a more appropriate number of tasks over a few days.

Also, Rosetta isn't Seti. The req'ts of Rosetta are very different from what worked for you before, so you should expect to have to adapt sometimes. But let everything settle down over a few more days and reassess then.
ID: 96153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 85
Credit: 5,122,619
RAC: 42,588
Message 96215 - Posted: 7 May 2020, 11:20:26 UTC

If I am not confused is the main reason to "increase the default run time" to decrease the download/upload pressure on the servers?

I am a newbie on how R@H goes about things so I may not be understanding it right,

If the servers are "still" getting hammered to much how about changing the "minimum" to say 8 hours?
And enforcing it so nobody is "grandfathered" in.

All this presumes reliable check pointing at the client end so part-time crunchers are not impacted.

Tom M
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 96215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,642,440
RAC: 13,135
Message 96217 - Posted: 7 May 2020, 11:39:01 UTC - in response to Message 96215.  

You do realise this thread is from 2008?
Grant
Darwin NT
ID: 96217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,509,582
RAC: 14,752
Message 96236 - Posted: 7 May 2020, 14:56:27 UTC - in response to Message 96217.  

If I am not confused is the main reason to "increase the default run time" to decrease the download/upload pressure on the servers?

I am a newbie on how R@H goes about things so I may not be understanding it right,

If the servers are "still" getting hammered to much how about changing the "minimum" to say 8 hours?
And enforcing it so nobody is "grandfathered" in.

All this presumes reliable check pointing at the client end so part-time crunchers are not impacted.

You do realise this thread is from 2008?

The subject has come up repeatedly since 2008 as the project's grown.
It's actually interesting to me that the 2008 request was to increase the 3hr default/1hr minimum to 6hr default/3hr minimum. ie minimum increased to the old default, which is kind of what Tom's suggesting we should do now.
Iirc, back then, no changes were made but the default became 4hrs for some reason, then 6hrs, then 8hrs when a surge of new users arrived until the servers got upgraded and killed it as an issue until last month.

With COVID19 coinciding with the massive Seti crowd looking for a new home, bringing their inappropriate (for Rosetta) settings with them, servers became overloaded again and we had those adjustments last month: deadlines reduced to 3 days project-wide (and caches downloaded in excess of 3-days automatically aborted), increased runtime to a maximum 36hrs (Rosetta used to allow 48hrs at one time, but the upload files became too large so 48hrs was removed and 24hrs became the max) and finally the 1hr minimum was increased to 2hrs. An attempt to increase the default runtime to 16hrs was withdrawn within a day. And finally the 7 second backoff for updates was increased to 31 seconds to reduce the hit of repeated manual update requests.

The result of all that primarily reduced server hits, but also allowed more (all?) of the new users to get tasks to run (and contribute) by reducing user hoarding of tasks and thereby massively improving average turnaround time for the project too. Everyone won, both users and project. There hasn't been anyone complaining about a lack of tasks, no server downtime and tasks getting back to the project quicker and in massively higher volume, all at the same time. Neat trick.

While everything seems to be ticking over, I guess there's no current need to change anything, but if servers came under pressure again I kind of agree that minimum runtimes should be next on the list.
Changing the minimum from 2hrs to 3hrs would be my choice as, for some reason, it seems to be what I'd call the power-users who insist on the shortest runtime on huge multi-core machines that are in some kind of constant state of DLing, running, rapidly-completing and re-uploading multiple times per hour in some kind of weird race to smooth and maximise their credits at whatever cost to the project.
As a massive generalisation, the people posting in the forums complaining the most about some perceived shortcoming were the same people causing the problem they were complaining about in the first place. So constraining those excess from the server side is the best solution in my view. And when pointing that out, it went down least well with them too.

Bygones...
ID: 96236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlanG7Mc

Send message
Joined: 26 Jun 06
Posts: 4
Credit: 867,054
RAC: 199
Message 108282 - Posted: 4 Apr 2023, 14:34:37 UTC

Increase Work Unit Deadlines, Please. At >least< add a day. Please

I'm getting Work Units with of estimated One Day+ CPU -- but the deadline is only 3 days or so later.

Other tasks on my computer chew up a lot of time, so Rosetta gets only about 10% of the CPU time (one WU per core).

I Hibernate/Sleep my computer at night & when I'm not using it (because of energy costs), and this makes the WU deadlines even tighter.
-
Result is, WU's risk going over deadline - Rosetta loses my results, and several days of CPU time on my end are wasted.

So, please Extend Deadlines by >At Least< one day. Then you get results, and my Cycles are not wasted.

Thank You.
---
PS: I've been crunching for Rosetta since 2006, for World Community Grid (WCG) since 2009, and for BOINC since 2004.
ID: 108282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,642,440
RAC: 13,135
Message 108283 - Posted: 4 Apr 2023, 21:14:18 UTC - in response to Message 108282.  
Last modified: 4 Apr 2023, 21:16:43 UTC

Increase Work Unit Deadlines, Please. At >least< add a day. Please
The problem isn't the deadline, the problem is the size of your cache and that your system is using the default benchmark values.
If you're running only one project, and it spends lots of time off line, then you might need a multi day cache to keep it busy.
But if you're running more than one project, then there is no need for a cache at all.

Your account, Preferences, Computing Preferences, General
           Store at least 0.1  days of work
Store up to an additional 0.01 days of work


You also need to run the benchmarks- that system has only recently been added to Rosetta, and the benchmarks are the default values- since here at Rosetta they are used for Credit & to determine processing time of the Tasks, all of those values are way out.

Tools, Run CPU benchmarks.

Run the benchmarks, reduce your cache size, and then when Rosetta has some new work again, you won't have any issues returning it in time.
Grant
Darwin NT
ID: 108283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlanG7Mc

Send message
Joined: 26 Jun 06
Posts: 4
Credit: 867,054
RAC: 199
Message 108293 - Posted: 10 Apr 2023, 18:36:14 UTC

{Thanks Grant, Darwin NT :( }

TERMINATION NOTICE:

With the stupidly 'soon' deadlines that >You Choose< ...

With the way that >I Choose< to run my computer (including >My Choosing< hours-per-day to run it) ...

I find that the >Very Last< Unit I will every try for You, cannot complete within deadline.

With Your Right and privilege to chose too-soon dead-lines ...
With >My Right and Privilege< to run my computer My Way, and to >Volunteer< My Way ...

Result: YOU have >Chosen< to Steal 30+ hours of CPU time from another project who actually wants results.

And you have just thrown away a Volunteer (Me) who has been here for Seventeen Years.
ID: 108293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,642,440
RAC: 13,135
Message 108295 - Posted: 11 Apr 2023, 7:11:14 UTC

Excellent externalising there; blaming everyone else for issues that you choose to create, that aren't really an issue.
Well done.
Grant
Darwin NT
ID: 108295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 172
Credit: 5,670,343
RAC: 3,329
Message 108332 - Posted: 24 Apr 2023, 21:02:15 UTC - in response to Message 108282.  

Increase Work Unit Deadlines, Please. At >least< add a day. Please


Me, too.

I run my (Linux) machine 24//7 and app_config.xml is set to run three Rosetta tasks at a time if they are available. Machine has 16 cores, but Boinc gets only 12.
The normal tasks are 8 hours and the Beta 6.00 tasks were 6 hours each.
Then I was sent a big batch of Beta 6.00 tasks that were set to 8 hours each, with a deadline of about 2 days. There is now way I can complete them all in time. I have been giving the processor three at a time, but I will not make it.

Leave tasks in memory is set.
Checkpointing interval is very large.
Task switching is set to only about 12 hours.
ID: 108332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 375
Credit: 10,720,871
RAC: 4,872
Message 108333 - Posted: 25 Apr 2023, 5:03:31 UTC - in response to Message 108332.  

Increase Work Unit Deadlines, Please. At >least< add a day. Please


Me, too.

I run my (Linux) machine 24//7 and app_config.xml is set to run three Rosetta tasks at a time if they are available. Machine has 16 cores, but Boinc gets only 12.
The normal tasks are 8 hours and the Beta 6.00 tasks were 6 hours each.
Then I was sent a big batch of Beta 6.00 tasks that were set to 8 hours each, with a deadline of about 2 days. There is now way I can complete them all in time. I have been giving the processor three at a time, but I will not make it.

Leave tasks in memory is set.
Checkpointing interval is very large.
Task switching is set to only about 12 hours.


So you can get through 18 tasks in 2 days, how big was the batch you received and how big is your cache?

If the batch was bigger than 18 the the solution would be to fix the bug this exposes in Boinc rather than to increase the deadline and get even more tasks.
ID: 108333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 172
Credit: 5,670,343
RAC: 3,329
Message 108334 - Posted: 25 Apr 2023, 7:53:55 UTC - in response to Message 108333.  

So you can get through 18 tasks in 2 days, how big was the batch you received and how big is your cache?

If the batch was bigger than 18 the the solution would be to fix the bug this exposes in Boinc rather than to increase the deadline and get even more tasks.


I normally have 5 projects running, but CPDN has sent no work in a very long time. DENIS has sent no work in the last few days. So neither of those have anything to do these days. What runs are WCG that run 4 at a time. Rosetta that runs three at a time, and Einstein that runs one at a time. I have lately downloaded some MilkyWay and Universe tasks that run up to one each at a time to fill in the gaps. Even so I could run two more task at a time all the time lately. All running tasks get at least 98% of a CPU, mostly over 99%.

Here are some recent tasks and how long they took:

1530770705 	1362734763 	23 Apr 2023, 19:23:10 UTC 	24 Apr 2023, 21:13:34 UTC 	Completed and validated 	29,282.59 	28,825.76 	562.49 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530770708 	1362734767 	23 Apr 2023, 19:23:10 UTC 	25 Apr 2023, 3:36:21 UTC 	Completed and validated 	29,043.76 	28,744.24 	540.16 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530770709 	1362734769 	23 Apr 2023, 19:23:10 UTC 	25 Apr 2023, 5:16:52 UTC 	Completed and validated 	28,998.19 	28,758.61 	545.44 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530770624 	1362734681 	23 Apr 2023, 19:23:10 UTC 	25 Apr 2023, 1:12:56 UTC 	Completed and validated 	29,110.59 	28,763.46 	535.53 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530756556 	1362671609 	23 Apr 2023, 4:48:27 UTC 	24 Apr 2023, 13:05:26 UTC 	Completed and validated 	28,809.55 	28,570.11 	530.23 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530756558 	1362671801 	23 Apr 2023, 4:48:27 UTC 	24 Apr 2023, 11:28:17 UTC 	Completed and validated 	28,894.38 	28,663.98 	592.49 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530756559 	1362671672 	23 Apr 2023, 4:48:27 UTC 	24 Apr 2023, 9:00:31 UTC 	Completed and validated 	29,041.22 	28,775.46 	586.73 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530756560 	1362671674 	23 Apr 2023, 4:48:27 UTC 	24 Apr 2023, 19:32:18 UTC 	Completed and validated 	29,035.93 	28,622.18 	596.30 	Rosetta Beta v6.00
x86_64-pc-linux-gnu
1530756561 	1362671764 	23 Apr 2023, 4:48:27 UTC 	24 Apr 2023, 17:07:46 UTC 	Completed and validated 	29,234.33 	28,856.15 	618.57 	Rosetta Beta v6.00
x86_64-pc-linux-gnu


Whenever my machine finishes a Rosetta task, it reports it and then requests more work.Sometimes it requests more tasks even when it has not just finished a task. Usually it gets nothing but sometimes it gets one or two new tasks. My event log has room for 5000 entries, so it does not go back three days. Right now there are 13 in progress, with three actually running. The ones that have not started running have about three days to complete from when they were sent.

Store at least 0.50 days of work.
Store up to an additional 1.5 days of work. (This is not large enough to coast over the gaps.)
Switch tasks every 1187 minutes
Request tasks to checkpoint at most every 1801 seconds
ID: 108334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 108335 - Posted: 25 Apr 2023, 10:02:08 UTC - in response to Message 108334.  
Last modified: 25 Apr 2023, 10:05:13 UTC

Store up to an additional 1.5 days of work. (This is not large enough to coast over the gaps.)
This is the issue, not the deadlines here. Yes, they are low and probably they could add a day or two without any issues on their side, but that's not the main issue you have. With Einstein, Milkyway and Universe as backup projects completely unnecessary to have such large cache, no idea about Universe, but at least Einstein and Milkyway should always have enough tasks.
.
ID: 108335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,642,440
RAC: 13,135
Message 108336 - Posted: 25 Apr 2023, 10:08:22 UTC - in response to Message 108334.  

Store up to an additional 1.5 days of work.
Switch tasks every 1187 minutes
There's the problem.
WTF would you stop project switching for almost day?
And if you want to have a cache, you use the Store at least value. Setting the Additional days value to anything over 0.05 results in odd behaviour (is the best way to put it), 0.01 additional days is best.
And the only time you need a cache is if you are running a single project that is unreliable, Since you are running multiple projects there is no need for a cache at all.

If you choose excessive values for various settings, then of course you are going to run in to issues with deadlines until the manager can sort things out. And the larger your cache, the longer between switching projects, the more projects you do then the longer it will take for things to settle down & your Resource Share settings to be met.
And then of course every time you change something, it has to start all over again.

Set no cache (0 days & 0.01 additional days) & 60min for switching between projects and all of the problems you have created will go away.


Whenever my machine finishes a Rosetta task, it reports it and then requests more work.Sometimes it requests more tasks even when it has not just finished a task.
It's doing that because that's what you are telling it to do with that 1.5 additional days value.
At the very least- having additional days set to 0.01 will stop the requesting of extra Rosetta work until the present work is almost done (or your resource share settings have been met & it requests work from another project).





(This is not large enough to coast over the gaps.)
What gaps? You're running several projects, so there is absolutely no point in having a cache of any sort.
If one project doesn't have any work, then other projects will get extra work done. When the project without work gets more work again, then it will get more work done for a while to meet your Resource Share settings.
Grant
Darwin NT
ID: 108336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 172
Credit: 5,670,343
RAC: 3,329
Message 108337 - Posted: 25 Apr 2023, 12:24:48 UTC - in response to Message 108336.  

(This is not large enough to coast over the gaps.)

What gaps? You're running several projects, so there is absolutely no point in having a cache of any sort.
If one project doesn't have any work, then other projects will get extra work done. When the project without work gets more work again, then it will get more work done for a while to meet your Resource Share settings.


I have gaps right now even though I am running five main projects.

I need DENIS tasks right now but I really need a week-long cache to handle that. And I really am using a total of only 2 days for the cache (0.5 + 1.5). Or I would need over a month of cache to manage the CPDN gaps. And I would have needed around a year to deal with the WCG fiasco. (I just changed that to 0.5+0.1 and will see if that makes any difference.)

See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein. I would not run MilkyWay or Universe at all*. Percentages are for Average Work Done.

_____
* They give out way too much credit for the work they do. even though they are set t0 0.15% resource share.
ID: 108337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 375
Credit: 10,720,871
RAC: 4,872
Message 108338 - Posted: 26 Apr 2023, 0:45:03 UTC - in response to Message 108337.  

(This is not large enough to coast over the gaps.)

What gaps? You're running several projects, so there is absolutely no point in having a cache of any sort.
If one project doesn't have any work, then other projects will get extra work done. When the project without work gets more work again, then it will get more work done for a while to meet your Resource Share settings.


I have gaps right now even though I am running five main projects.

I need DENIS tasks right now but I really need a week-long cache to handle that. And I really am using a total of only 2 days for the cache (0.5 + 1.5). Or I would need over a month of cache to manage the CPDN gaps. And I would have needed around a year to deal with the WCG fiasco. (I just changed that to 0.5+0.1 and will see if that makes any difference.)

See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein. I would not run MilkyWay or Universe at all*. Percentages are for Average Work Done.

_____
* They give out way too much credit for the work they do. even though they are set t0 0.15% resource share.


You cannot use your cache to cover periods where an individual project does not produce work.

Even if you set a cache of a year there would be times when you would run out (WCG, CPDN just as two current examples).

The cache is not intended for that purpose, it is to cover short term problems such as internet outages. Set a low cache and let the resource share iron out any project flow problems.
ID: 108338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,642,440
RAC: 13,135
Message 108339 - Posted: 26 Apr 2023, 7:31:00 UTC - in response to Message 108337.  
Last modified: 26 Apr 2023, 7:36:17 UTC

See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein.
And there is the cause of all your "issues"- you are trying to make the programme do something it was never meant to do.

The whole reason for BOINC was to allow multiple projects to operate cooperatively.
By setting the Resource share values, people are able to allow projects more or less work depending on their preferences. It is not about time. it is not about Credit. It is about work done.

The fact that you are trying to have x Tasks for y projects running at any given time is in complete and total contravention of how BOINC was designed to work, and how it actually does work.
The Resource Share settings are a ratio, not a percentage, and they are based on work done- Not on processing time, not Credit, not the number of Tasks being run for a particular project, but the work actually done for each project.


There is no problem with BOINC, nor any Projects & their deadlines and their work of lack of it- the problem is you & how you are using BOINC.
Grant
Darwin NT
ID: 108339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 108340 - Posted: 26 Apr 2023, 10:39:38 UTC - in response to Message 108337.  

I have gaps right now even though I am running five main projects.
How's that possible with Einstein and Milkyway in the mix? They always have work available. Of course you must not mess with max_concurrent in the app_config just to meet the exact amount of cores you want for each project, that's not done for that, that's to limit applications that cause issues if running too many WUs for that app at once and needs some overlap to work properly, in particular if running that many projects.
.
ID: 108340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1987
Credit: 38,509,582
RAC: 14,752
Message 108347 - Posted: 3 May 2023, 15:01:54 UTC - in response to Message 108337.  

See, what I really want is to run 50% for CPDN, 25% for WCG, 10% for Rosetta, 10% for DENIS, and a trifle for Einstein. I would not run MilkyWay or Universe at all*. Percentages are for Average Work Done

At times like this, I'm not very friendly.

Average Work done.

Over what period?
At all times?
Over a day? A week? A month? A year?

"Want" something else. Problem solved.
ID: 108347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Number crunching : Discussion on increasing the default run time



©2024 University of Washington
https://www.bakerlab.org