Initial Estimated completion time problem.

Message boards : Number crunching : Initial Estimated completion time problem.

To post messages, you must log in.

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95719 - Posted: 2 May 2020, 1:03:09 UTC

After running Rosetta for over a month, with a couple of Application updates during that time, my Estimated completion time for newly downloaded work was generally within 15 minutes of my Target CPU Runtime (default, 8hrs) with an Estimated Remaining time for unstarted work of 7hrs 45min.

With the new application (and so no processing history), the initial Estimated Remaining time is 4hr 30min- just over half of the actual Target CPU Runtime. This is going to result in a huge number of Tasks missing their deadlines & being re-issued, and many of them will be re-issued to other hosts that will miss their deadlines & those Tasks just going to waste.

The ideal fix would be for the Estimated completion time to always be the same as the Hosts Target CPU Runtime. If tasks finish sooner, or it runs till the Watchdog timer ends it- not a problem. There is almost no chance of the Host missing deadlines* because the Estimated completion time will always start with the the most frequent actual CPU Runtime, being it's Target CPU Runtime.
The next best option would be for the initial Estimated completion time for a new Application or new Host to be double the present Initial Estimated completion time. People will still get work, but there will be no chance of Tasks going to waste due to multiple missed deadlines because of the unrealistically short Estimated completion times.






* Those hosts with huge cache settings & large & variable differences between CPU time & Runtime will always tend have issues with deadlines.
Grant
Darwin NT
ID: 95719 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 95720 - Posted: 2 May 2020, 1:10:06 UTC - in response to Message 95719.  

After running Rosetta for over a month, with a couple of Application updates during that time, my Estimated completion time for newly downloaded work was generally within 15 minutes of my Target CPU Runtime (default, 8hrs) with an Estimated Remaining time for unstarted work of 7hrs 45min.

With the new application (and so no processing history), the initial Estimated Remaining time is 4hr 30min- just over half of the actual Target CPU Runtime. This is going to result in a huge number of Tasks missing their deadlines & being re-issued, and many of them will be re-issued to other hosts that will miss their deadlines & those Tasks just going to waste.

The ideal fix would be for the Estimated completion time to always be the same as the Hosts Target CPU Runtime. If tasks finish sooner, or it runs till the Watchdog timer ends it- not a problem. There is almost no chance of the Host missing deadlines* because the Estimated completion time will always start with the the most frequent actual CPU Runtime, being it's Target CPU Runtime.
The next best option would be for the initial Estimated completion time for a new Application or new Host to be double the present Initial Estimated completion time. People will still get work, but there will be no chance of Tasks going to waste due to multiple missed deadlines because of the unrealistically short Estimated completion times.






* Those hosts with huge cache settings & large & variable differences between CPU time & Runtime will always tend have issues with deadlines.


Estimated CPU Time isn't always accurate - how long are they actually running? (try to run one as a test, if you haven't ran them)?
ID: 95720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 95721 - Posted: 2 May 2020, 1:11:03 UTC - in response to Message 95719.  
Last modified: 2 May 2020, 1:12:56 UTC

After running Rosetta for over a month, with a couple of Application updates during that time, my Estimated completion time for newly downloaded work was generally within 15 minutes of my Target CPU Runtime (default, 8hrs) with an Estimated Remaining time for unstarted work of 7hrs 45min.

With the new application (and so no processing history), the initial Estimated Remaining time is 4hr 30min- just over half of the actual Target CPU Runtime. This is going to result in a huge number of Tasks missing their deadlines & being re-issued, and many of them will be re-issued to other hosts that will miss their deadlines & those Tasks just going to waste.

The ideal fix would be for the Estimated completion time to always be the same as the Hosts Target CPU Runtime. If tasks finish sooner, or it runs till the Watchdog timer ends it- not a problem. There is almost no chance of the Host missing deadlines* because the Estimated completion time will always start with the the most frequent actual CPU Runtime, being it's Target CPU Runtime.
The next best option would be for the initial Estimated completion time for a new Application or new Host to be double the present Initial Estimated completion time. People will still get work, but there will be no chance of Tasks going to waste due to multiple missed deadlines because of the unrealistically short Estimated completion times.






* Those hosts with huge cache settings & large & variable differences between CPU time & Runtime will always tend have issues with deadlines.


It's worst if you have a really long target runtime. The initial estimated remaining time seems to always be 4.5h. With a really long target runtime, it'll also take quite a while for the estimated remaining time to recalibrate itself. The fix your suggesting seems really nice, I hopes it's possible to implement.
ID: 95721 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95722 - Posted: 2 May 2020, 1:21:06 UTC - in response to Message 95720.  
Last modified: 2 May 2020, 1:29:03 UTC

Estimated CPU Time isn't always accurate - how long are they actually running? (try to run one as a test, if you haven't ran them)?
Ah, from my first sentence in my initial post-
After running Rosetta for over a month



Yes, some finish earlier. Some finish later. Some finish a lot later. But around 90% would be finished within 5-10min of the Target CPU Runtime.
Grant
Darwin NT
ID: 95722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile yoerik
Avatar

Send message
Joined: 24 Mar 20
Posts: 128
Credit: 169,525
RAC: 0
Message 95730 - Posted: 2 May 2020, 2:05:04 UTC - in response to Message 95722.  

Estimated CPU Time isn't always accurate - how long are they actually running? (try to run one as a test, if you haven't ran them)?
Ah, from my first sentence in my initial post-
After running Rosetta for over a month



Yes, some finish earlier. Some finish later. Some finish a lot later. But around 90% would be finished within 5-10min of the Target CPU Runtime.

I'm referring to the WUs that BOINC Manager is saying will finish in 4 hours 30 mins - not overall. I hope that clarifies what I'm referring to. What is the difference between that estimated runtime and the actual runtime?
ID: 95730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95737 - Posted: 2 May 2020, 2:21:30 UTC - in response to Message 95730.  
Last modified: 2 May 2020, 2:26:13 UTC

What is the difference between that estimated runtime and the actual runtime?
I mentioned that also.
15min give or take 5-10min.

my Target CPU Runtime (default, 8hrs) with an Estimated Remaining time for unstarted work of 7hrs 45min.
....
But around 90% would be finished within 5-10min of the Target CPU Runtime.



And to cover the difference between CPU time & Runtime.
eg r4k_8381_fold_SAVE_ALL_OUT_922685_1105_0
Run time	7 hours 56 min 33 sec
CPU time	7 hours 56 min 1 sec

Grant
Darwin NT
ID: 95737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 95750 - Posted: 2 May 2020, 3:14:11 UTC
Last modified: 2 May 2020, 3:14:43 UTC

Do you people not have constant internet connections? Why are you all running massive caches? Just set it to .5 and be done with it. There's no reason to have more then 2x your core count in WU cache waiting to be crunched.
ID: 95750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 95751 - Posted: 2 May 2020, 3:15:54 UTC - in response to Message 95750.  
Last modified: 2 May 2020, 3:22:56 UTC

Do you people not have constant internet connections? Why are you all running massive caches? Just set it to .5 and be done with it. There's no reason to have more the 2x your core count in WU cache waiting to be crunched.

I've tried setting the cache to 0.5, it doesn't work if your target runtime is 36 hours and the initial estimated completion time for new apps is 4.5 hours. Every time a new version comes out, all the calibration to the estimated completion time gets reset, BOINC believes it takes 4.5 hours to complete a task, your computer gets flooded with tasks, and you are forced to either abort and/or select a saner target runtime. Setting the cache to 0.1 is the only solution for me.

IF having a reasonable cache size was a good enough fix, this wouldn't be a problem at all.
ID: 95751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95752 - Posted: 2 May 2020, 3:20:28 UTC

Or better yet, they fix it so the Estimated completion times equal the Target CPU Runtime.
Or at the very least make the initial Estimate so it is higher than the default Target CPU Runtime, and gradually reduces downwards closer the the actual time instead of having to increase up from an extremely low initial Estimate.
Grant
Darwin NT
ID: 95752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95754 - Posted: 2 May 2020, 3:43:24 UTC

And to make things even worse- as my existing Tasks have completed, while most are done within a few minutes of the Target CPU time, others do finish sooner. And this results in the Estimated times lowering- by about 2 minutes.
However for the Tasks yet to be processed by the new application the Estimated times have dropped from 4hrs 30min down to 52min 30sec.
This just shouldn't occur.
Grant
Darwin NT
ID: 95754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95769 - Posted: 2 May 2020, 5:34:23 UTC - in response to Message 95754.  

However for the Tasks yet to be processed by the new application the Estimated times have dropped from 4hrs 30min down to 52min 30sec.
Now up to 1hr 10min.
Grant
Darwin NT
ID: 95769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2121
Credit: 41,179,074
RAC: 11,480
Message 95795 - Posted: 2 May 2020, 12:56:46 UTC - in response to Message 95752.  

Or better yet, they fix it so the Estimated completion times equal the Target CPU Runtime

What madness is this?!

Where does this get set? A Boinc setting or a project setting? Hard-coded or user-definable somewhere?
It's amazing how long Boinc has existed and something as basic as this causes a riot with every program update.
And by amazing, of course I mean pathetic.
ID: 95795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 95818 - Posted: 2 May 2020, 15:54:34 UTC - in response to Message 95751.  

Do you people not have constant internet connections? Why are you all running massive caches? Just set it to .5 and be done with it. There's no reason to have more the 2x your core count in WU cache waiting to be crunched.

I've tried setting the cache to 0.5, it doesn't work if your target runtime is 36 hours and the initial estimated completion time for new apps is 4.5 hours. Every time a new version comes out, all the calibration to the estimated completion time gets reset, BOINC believes it takes 4.5 hours to complete a task, your computer gets flooded with tasks, and you are forced to either abort and/or select a saner target runtime. Setting the cache to 0.1 is the only solution for me.

IF having a reasonable cache size was a good enough fix, this wouldn't be a problem at all.


Then set the cache to 0. One in, one out. Over time the completion time will correct and you can add a cache back if you want.
ID: 95818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 95859 - Posted: 2 May 2020, 21:06:39 UTC - in response to Message 95818.  
Last modified: 2 May 2020, 21:08:38 UTC

Do you people not have constant internet connections? Why are you all running massive caches? Just set it to .5 and be done with it. There's no reason to have more the 2x your core count in WU cache waiting to be crunched.

I've tried setting the cache to 0.5, it doesn't work if your target runtime is 36 hours and the initial estimated completion time for new apps is 4.5 hours. Every time a new version comes out, all the calibration to the estimated completion time gets reset, BOINC believes it takes 4.5 hours to complete a task, your computer gets flooded with tasks, and you are forced to either abort and/or select a saner target runtime. Setting the cache to 0.1 is the only solution for me.

IF having a reasonable cache size was a good enough fix, this wouldn't be a problem at all.


Then set the cache to 0. One in, one out. Over time the completion time will correct and you can add a cache back if you want.


Yup, My cache was set to 0.1+ 0 and I still managed to almost get swamped on Ralph (24 hour target run-time, 3 threads set for Ralph, and it downloaded 9 tasks, there is a high chance 3 tasks will barely miss the deadline if I don't intervene). That happened because the estimated run-times were 47:39 (HOW?!). It's not much better on Rosetta (I think it was 57 minutes?) Having a 0 cache seems to be the only band-aid solution at this point...
ID: 95859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,759,444
RAC: 22,916
Message 95885 - Posted: 2 May 2020, 23:33:51 UTC - in response to Message 95795.  

Or better yet, they fix it so the Estimated completion times equal the Target CPU Runtime

What madness is this?!

Where does this get set? A Boinc setting or a project setting? Hard-coded or user-definable somewhere?
It's amazing how long Boinc has existed and something as basic as this causes a riot with every program update.
It is set by the BOINC server software, based on data supplied by the project- which in this this case Rosetta.
Grant
Darwin NT
ID: 95885 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Initial Estimated completion time problem.



©2024 University of Washington
https://www.bakerlab.org