BOINC - to completion time

Message boards : Number crunching : BOINC - to completion time

To post messages, you must log in.

AuthorMessage
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 6563 - Posted: 17 Dec 2005, 15:29:23 UTC
Last modified: 17 Dec 2005, 15:36:59 UTC

In a thread on another board, Dennis Burkel asked
On the boinc program under the " Work Tab " the " To Completion " time seems to always count upward. I understand that at the beginning of the workunit this may occur, but it seems to happen all the way through to the end of the workunit.


The mod said it was off-topic there, so I thought I'd answer it here.

Rosetta currently has difficulty in estimating the % complete. Typically the result 'sticks' at 1%, 10%, 20% ... etc for long periods.

I hope this is something the programmers will be able to fix, as it leads to anxiety in newcomers and lack of reliable info even to old hands.

In the meantime, it also has the knock-on effect you have seen. For as long as the app keeps the %complete constant, the BOINC client will report that the completion time is proportinal to the time so far.

This makes perfect sense once you know that the %complete figure comes from the app and that the time to completion is derived from it by the client.

With the current imperfect reporting of %complete, I'd expect to see time to completion rising linearly most of the time, but which drops dramatically a dozen or so times during each result's run.

River~~
(newcomer to Rosetta but nearly a year in BOINC)

edits: add links, improve spelling

ID: 6563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 6566 - Posted: 17 Dec 2005, 15:49:27 UTC

Thanks, River. :-)

I'll add that the "to completion" time calculation has been changed three times in the last six months, trying to get it closer to "reality". V4.45 did _all_ the calculation, once the result started, based on the percent-complete, and jumped all over the place. V4.72 put a very heavy emphasis on the _estimated_ time, averaging in the "calculated" time based on % complete, but linearly. This resulted in silly things like "90% complete after 1 hour, 11 hours left to go", when the estimated time had been, say, 14 hours. V5.x does a much better job, with the estimated time "decaying" in the calculation so that the % complete becomes more important as the result progresses. (Note: version numbers are from memory, may not be correct.)

I personally believe the decay should be squared, making the calculation start out heavily biased towards the estimate, but almost totally % complete based by the end. I believed that to the point that when V5.1.x was being tested, I worked up spreadsheets with various algorithms, and posted the results of what I found to the developers. I don't know what was finally decided, and I haven't rerun my tests since then, but I do think somewhere around V5.2.8 the "decay" factor increased; don't know if it's squared...

Still, even V5.2.13 is going to be "confused" to some extent by a result that's "stuck" - but it won't go UP like Dennis described, not to that extent. So I checked, and verified that Dennis is running 4.45... :-)

ID: 6566 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 6597 - Posted: 17 Dec 2005, 21:27:17 UTC - in response to Message 6566.  
Last modified: 17 Dec 2005, 21:30:04 UTC

Thanks, River. :-)

I'll add that the "to completion" time calculation has been changed three times in the last six months, trying to get it closer to "reality".

[...]

Still, even V5.2.13 is going to be "confused" to some extent by a result that's "stuck" - but it won't go UP like Dennis described, not to that extent. So I checked, and verified that Dennis is running 4.45... :-)


so do I on my windows boxes, which is why I recognized what was happening!

However, in my opinion the place to fix the issue is in the app, not in the client.

OK, I accept that the Rosetta code has a random completion time, but it must surely be possible to do better than having ony ten values for the completion (in my observation these are 1%, 10%, 20% ... 90%).

If in one of the ten stages you have a counter that goes up to a million, then you say that the completion is K/10000000 or whatever. OK, when you jump out of that phase after only 300,000 iterations the progress jumps from 13% to 20% and that is where the clever accounting in the client is of great benefit.

If you don't have such a counter, how do you guard against run-away loops running forever?

But having the progress grow gradually and steadiliy from 10 to 13 and jumping to 20 is better than sticking at 10 for a long while and then jumping to 20.

It looks better for any GUI users who look at the %age rather than the completion time, it looks better for command line users who may be happier looking at a fraction complete value than an estimated time to completion that is in thousands of seconds.

On top of that, in rare cases the value of the %progress mught even drop a hint to programmers doing some debugging.

So for all those reasons please consider improving the reporting of %complete.

I personally believe the decay should be squared, making the calculation start out heavily biased towards the estimate, but almost totally % complete based by the end

Hmm,

on projects with extremely linear iterations the older time calculations were much better.

CPDN 'slab' runs, for example, consist of around 700,000 timesteps and each checkpoint-set of 144 steps takes pretty much the same runtime. BOINC used to predict a fairly accurate duration of a two-month CPDN run after it had run for just a few hours, now it can't, it slowly morphs from the pre-run prediction and only gets close to the right answer after more than halfway through.

I've seen complaints on the CPDN boards about the v5 client having 'broken' the time calculations and how to calculate the duration manually...

It just goes to show how every feature of BOINC is -- quite rightly -- a compromise between the diverse needs of widely differing projects.

River~~
ID: 6597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 6620 - Posted: 18 Dec 2005, 4:52:42 UTC - in response to Message 6566.  
Last modified: 18 Dec 2005, 4:53:53 UTC

I'll add that the "to completion" time calculation has been changed three times in the last six months, trying to get it closer to "reality". V4.45 did _all_ the calculation, once the result started, based on the percent-complete, and jumped all over the place. V4.72 put a very heavy emphasis on the _estimated_ time, averaging in the "calculated" time based on % complete, but linearly. This resulted in silly things like "90% complete after 1 hour, 11 hours left to go", when the estimated time had been, say, 14 hours. V5.x does a much better job, with the estimated time "decaying" in the calculation so that the % complete becomes more important as the result progresses. (Note: version numbers are from memory, may not be correct.)

I personally believe the decay should be squared, making the calculation start out heavily biased towards the estimate, but almost totally % complete based by the end. I believed that to the point that when V5.1.x was being tested, I worked up spreadsheets with various algorithms, and posted the results of what I found to the developers. I don't know what was finally decided, and I haven't rerun my tests since then, but I do think somewhere around V5.2.8 the "decay" factor increased; don't know if it's squared...

Still, even V5.2.13 is going to be "confused" to some extent by a result that's "stuck" - but it won't go UP like Dennis described, not to that extent. So I checked, and verified that Dennis is running 4.45... :-)

Bill,

Not sure if I found the exactly right place, but the code I found shows:

double ACTIVE_TASK::est_cpu_time_to_completion() {
    if (fraction_done >= 1) return 0;
    double wu_est = result->estimated_cpu_time();
    if (fraction_done <= 0) return wu_est;
    double frac_est = (current_cpu_time / fraction_done) - current_cpu_time;
    double fraction_left = 1-fraction_done;
    return fraction_done*frac_est + fraction_left*fraction_left*wu_est;
}


==== edit

Both pre and code are still toast so it is messing up the indenture ...
ID: 6620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 6623 - Posted: 18 Dec 2005, 5:08:06 UTC - in response to Message 6620.  

return fraction_done*frac_est + fraction_left*fraction_left*wu_est;


Hmm... "fraction_left*fraction_left" looks like squared to me... it's definitely not the same as the formula that I worked out though, especially with (if my memory is working) the calculation of frac_est, but that may just be because of different intermediate variables. If I have the time, I'll have to dig up the spreadsheet and put this through it and see what the differences are. This is very elegantly done, I like it!

(Unless of course I can find a problem! :-) Still, the advantage of being a couple of % more accurate in the 'extreme cases' may not be worth doubling the complexity of the code.)

My memory is getting worse by the day. I can't remember the algorithm I came up with, or what I called that spreadsheet, or even who I emailed about this. What was my name again? Oh yeah, it's up there on the left...

ID: 6623 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 6628 - Posted: 18 Dec 2005, 6:17:58 UTC

This was extracted from "stable" so it should be in 5.2.13 ...

If needed I can look in the CVS notes and get a more precise time for the change ...
ID: 6628 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 6634 - Posted: 18 Dec 2005, 6:55:14 UTC - in response to Message 6628.  

If needed I can look in the CVS notes and get a more precise time for the change ...


No, I'm sure the change date had little if any relationship to when the spreadsheet was done. I'll just have to take a few minutes to poke around and find it on my drive. Right now I'm running on 3 hrs sleep from last night, so it's going to wait until I'm coherent. Or at least awake...

ID: 6634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 6637 - Posted: 18 Dec 2005, 7:29:29 UTC - in response to Message 6623.  

... "fraction_left*fraction_left" looks like squared to me...


fraction_left * wu_est is the time left based on the estimate,

so that value is combined in a linear way with the time left based on progress, making that term into fraction_left * (fraction_left * wu_est)

For example at 1/3 of the way through the result is 1/3 the progress based estimate + 2/3 the original

You can call it squared as the fraction comes in twice, but you can also call it call it a linear combination of the two estimates. One describes what you do, the other describes what you intend!
ID: 6637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 6647 - Posted: 18 Dec 2005, 11:10:08 UTC - in response to Message 6628.  

This was extracted from "stable" so it should be in 5.2.13 ...

If needed I can look in the CVS notes and get a more precise time for the change ...

From memory the squared value was introduced in 5.2.3. The other was used from 4.72 until then.
BOINC WIKI

BOINCing since 2002/12/8
ID: 6647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 576
Credit: 4,695,359
RAC: 10
Message 6734 - Posted: 18 Dec 2005, 21:20:45 UTC - in response to Message 6647.  
Last modified: 18 Dec 2005, 21:21:22 UTC

From memory the squared value was introduced in 5.2.3. The other was used from 4.72 until then.


Well, I found the spreadsheet, from 10/31. It's definitely a different algorithm than either the 'original' or the 'intermediate', but is considerably simpler than what I had. Translating variable names, I had

(((fraction_left*wu_est/100)*fraction_left)+((fraction_left*frac_est/100)*fraction_done)*fraction_done) / (100+(fraction_done*fraction_done))

instead of fraction_done*frac_est + fraction_left*fraction_left*wu_est as in the current code.

There's a scaling problem because I used integers rather than doubles, but allowing for that, it worked plugged directly into my spreadsheet. I checked the results, assuming an estimated 600 minutes but an actual 100 minutes (extreme example I'd used in the spreadsheet) and the current algorithm is very good, but IMHO, it still weights the project estimate too much in the middle and falls off too fast. At 1% it gives 589 where I gave 583, at 50% it gives 150 where I gave 54, and at 90% it gives 6 where I gave 10. The "real" numbers for a perfectly-linear and perfectly-reporting app would be 99, 50, and 10. That's what 4.45 would have shown, and obviously (where this thread started...) it _needs_ to be weighted more to the estimate at the beginning; the 4.72-5.2.3 version however would have given 589, 356, and 15.

So... I think the version "as is" is quite good, and definitely simple. The more-complicated one I think would give "better" results, but it may not be worth the effort.

ID: 6734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : BOINC - to completion time



©2024 University of Washington
https://www.bakerlab.org