Possible bug in the "Average Processing Rate" calculation.

Message boards : Number crunching : Possible bug in the "Average Processing Rate" calculation.

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 95996 - Posted: 4 May 2020, 8:57:10 UTC - in response to Message 95942.  

The claimed credit is described here https://boinc.berkeley.edu/trac/wiki/CreditNew


If this is taking up too much time / has been done to death previously then please feel free to ignore me but two things confuse me :-

The reliance on wu.fpops.est which, for Rosetta, appears to be a fixed 80,000 regardless of preferred run time

The statement “ Then the average credit per job should be the same for all hosts.”. Surely in Rosetta, where the amount of work done by a host on a given WU is not fixed, this is not true.
ID: 95996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 95997 - Posted: 4 May 2020, 9:38:00 UTC - in response to Message 95996.  
Last modified: 4 May 2020, 9:38:36 UTC

The claimed credit is described here https://boinc.berkeley.edu/trac/wiki/CreditNew


If this is taking up too much time / has been done to death previously then please feel free to ignore me but two things confuse me :-

The reliance on wu.fpops.est which, for Rosetta, appears to be a fixed 80,000 regardless of preferred run time

The statement “ Then the average credit per job should be the same for all hosts.”. Surely in Rosetta, where the amount of work done by a host on a given WU is not fixed, this is not true.


Well, to be fair, that was the whole point behind this and another thread. This thread mainly deals with the issue that since the estimated computation size for rosetta tasks are fixed at 80,000GFLOPS, average processing rate appears to be inversely proportional to target runtime.

The other thread deals with the observation that very long-running tasks have a tendency to return absurdly low credits, probably linked to the thing you're talking about.

I hope the updated validator fixes the credit side of things.

At least now I can rest assured that the issue is not my main rig.
ID: 95997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 95998 - Posted: 4 May 2020, 9:44:02 UTC - in response to Message 95973.  
Last modified: 4 May 2020, 9:44:21 UTC

Sorry, that limit should be increased on Ralph@h for new jobs now. FYI, I updated both the scheduler and the validator on R@h. So hopefully these updates will address the job cache and the crediting issues that Tomcat has highlighted.


Ah, thanks a lot. All my machines are back online acrunchin', now that I've changed the URL.
ID: 95998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96021 - Posted: 4 May 2020, 14:15:10 UTC - in response to Message 95996.  

The statement “ Then the average credit per job should be the same for all hosts.”. Surely in Rosetta, where the amount of work done by a host on a given WU is not fixed, this is not true.


I believe the statement you were quoting is talking about a hypothetical, fix sized WU with an exactly number of FPOPS required to complete it. R@h uses these credit claims to build up an average of credit claimed PER MODEL, and then awards credit based on the number of completed models. Not on a per WU basis.

The idea of both the credit new, and the R@h method of granting credit is that credit is NOT based on how long you run a work unit, rather it is based on how much useful work (models) you produce. You might run the WU for half the time of another host, but your CPU is twice as fast. If both methods produce 15 models, on the same batch of work, then both get the same credit.
Rosetta Moderator: Mod.Sense
ID: 96021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 72
Credit: 18,450,036
RAC: 0
Message 96082 - Posted: 5 May 2020, 1:48:54 UTC - in response to Message 96021.  

The statement “ Then the average credit per job should be the same for all hosts.”. Surely in Rosetta, where the amount of work done by a host on a given WU is not fixed, this is not true.


I believe the statement you were quoting is talking about a hypothetical, fix sized WU with an exactly number of FPOPS required to complete it. R@h uses these credit claims to build up an average of credit claimed PER MODEL, and then awards credit based on the number of completed models. Not on a per WU basis.

The idea of both the credit new, and the R@h method of granting credit is that credit is NOT based on how long you run a work unit, rather it is based on how much useful work (models) you produce. You might run the WU for half the time of another host, but your CPU is twice as fast. If both methods produce 15 models, on the same batch of work, then both get the same credit.

Thank you! This placed this into language I can understand. My RAC is increasing daily, so I'm not going to complain. 12 hour runtimes seem to be doing just fine though I'm still going to have aborted wus. But they seem to be slowly getting more accurate as time goes on.
ID: 96082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 96116 - Posted: 5 May 2020, 16:06:26 UTC
Last modified: 5 May 2020, 16:17:41 UTC

Oh look! Whatever was changed, the APR calculations seem much better now.
https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=3208606 Before and after the change. (My phone submitted that lone 4.20 task after the change)
https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=2263735 Slowly rising back up to sane values again.
https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=3754624 Same over here.

https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=3684192 in the case of hosts with short runtimes, it's dropping back to saner values. Wait a minute, why is it only slightly higher than my phone? Wait, no. It's my phone's value that seems way too high...
Whatever, it still appears to be a huge improvement.

Credit-wise. It seems much more stable but we need a larger sample size.
ID: 96116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 96148 - Posted: 6 May 2020, 5:36:22 UTC

Looks like my updates are helping, great.
ID: 96148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 393
Credit: 12,110,248
RAC: 4,484
Message 96155 - Posted: 6 May 2020, 8:03:23 UTC - in response to Message 96148.  

Looks like my updates are helping, great.


Always :-)
ID: 96155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 96273 - Posted: 8 May 2020, 18:37:16 UTC - in response to Message 96148.  
Last modified: 8 May 2020, 18:39:03 UTC

Looks like my updates are helping, great.


Just got another full broadside of tasks in, the credits per task is stable, APR has remained sane. I think this has been completely fixed. Thanks!
ID: 96273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Possible bug in the "Average Processing Rate" calculation.



©2024 University of Washington
https://www.bakerlab.org