Message boards : Number crunching : code release and redundancy
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,120 RAC: 21 |
In practice, we are more likely to implement true redundancy. Boinc already has strong support for it. And besides more trustworthy credit, we will get automatic validation of results. I understand all the reasoning here, but since you didn't mention flops-counting, have you looked at that option? I don't know how difficult it would be to implement (obviously more difficult than just turning on redundancy) but it might be a way to at least eliminate the "most obvious" ways to cheat, without costing the project as much CPU horsepower as "standard" redundancy would. |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
Hey I got an Idea. Why not implent restrictor plates like Nascar. That would keep the credit's fair. Did I just say that. LOL |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
I understand all the reasoning here, but since you didn't mention flops-counting, have you looked at that option? I don't know how difficult it would be to implement (obviously more difficult than just turning on redundancy) but it might be a way to at least eliminate the "most obvious" ways to cheat, without costing the project as much CPU horsepower as "standard" redundancy would. Two questions about flops-counting, since I came in a bit late: 1) Is there a link to explain its implementation in boinc? I'm unclear from the discussion whether there is limited support for it in boinc already, or it's as hypothetical as any other method. 2) Would it be respected by the community? I.e. if we were to do the extra work to implement it instead of redundancy, is it the best method? Or should we spend that time on something else? |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,120 RAC: 21 |
Two questions about flops-counting, since I came in a bit late: The "SETI Enhanced" app (in beta) is using it, and support is built in for reporting the "flops" with BOINC V5.x. My (limited) understanding is that the "count" returned after each result is completed is extremely consistent, across platforms, CPU speeds, etc. - basically, it replaces the estimate "benchmark*time" with an actual count of the work performed, which can then be converted into Cobblestones (credits) by use of a simple multiplier. There is some amount of "overhead" in adding in the code to do the counting, so each result will take some percentage of time 'longer' to run - I believe this is _far_ less than the loss you would have using redundancy, however. My (semi-worthless) memory wants to say 4-10%. Ingleside is the one that would be able to answer any technical questions... As far as the respect of the community, I think it would be accepted _if_ there was a "reference" WU available for Rosetta, so that people could run the current app against the reference WU and see what the credit figure is, then run the "flops-counting" app against it, and compare. Obviously those with inflated benchmarks would see a lower credit value, and those with understated benchmarks (such as Linux with standard BOINC client) would see a higher credit value - however, when _every_ user came out with the _same_ credit figure for the reference WU, they would quickly understand the "fairness" of this method compared to the benchmarks. Redundancy is ideal for some projects - but it is unnecessary for the science part of Rosetta, and would consume 1/2 to 3/4 of the project's available CPU power, just for the sake of "fair" credits. I think most of us who care about the science would hate to see the waste - although it is frustrating to think that people are "cheating", even a monthly SQL run by project staff to cull out the worst credit-mongers would probably help greatly. Flops-counting would be a step above that, "pseudo-redundancy" equal in "real" value but probably better in "perceived" value (as most other projects use redundancy and it's relatively simple to explain), and Paul D. Buck's "calibrated host" method the best of all. (Although I think a combination of Paul's approach with flops-counting would be the absolute best...) Basically, ANY method of reducing perceived cheating would help. You and the other project developers and staff will have to make the decisions on which method is doable with the resources you have, and what "CPU power" costs you are willing to incur to get there, whether it be 4% for flops-counting, 50% or more for "true" redundancy, or 0% for "pseudo" redundancy. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
If we had unlimited development time, I think the best system would be one along the lines of Hermann's pseudo-redundancy. One could take the median time of the Work Units that differ only by random number seed. I believe the Work Units are structured such that thousands are sent out at a time that fit this requirement. This would mean that very reliable statistics could be generated about the average cpu requirements of a Work Unit. This could then be used to assign credit. It would be far less noisy than 2 fold or 4 fold redundancy. Jack, I don't understand why you say that pseudo-redundancy (as described in your post; you seem to refer to the first of my two 'pseudo-redundancy' proposals made in this thread) would require 'unlimited' development time. Isn't this just one database query per WU type (take median of claimed credit) and than presumably issuing one call of the BOINC server infrastructure (the one for issuing the granted credit). Issuing fixed credits per WU type must be possible in BOINC fairly easily, since some projects are doing this (climate and folding-beta). F@H by the way does not use (true) redundancy and they have a large number of different WU types (same as rosetta) each of which is assigned a different fixed credit. They seem to determine the credit per WU differently than proposed here but at least for issuing the credit they might be an example. This whole procedure could perhaps even be run in a semi-automatic fashion (run database query and issue credit, once the first couple hundred WUs of one series have been returned) if complete intergration into the BOINC server code is too tricky. I am by the way considering to quit rosetta and go back to folding@home if rosetta moves to true redundancy. So, anyway, I would have assumed that pseudo-redundany requires only slightly more development effort than simply turning on true redundany and _considerably_ less than flop counting, where presumably some BOINC flop counting calls have to be added to each and every routine of the Rosetta code and even to some of the external library calls (like those from Numerical Recipes ;-). |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Two questions about flops-counting, since I came in a bit late: See here void boinc_ops_cumulative(double floating_point_ops, double integer_ops); The function was resently changed to also include integer_ops, this is supported if you're running up-to-date scheduling-server, but it will AFAIK not be included in BOINC-client v5.2.x. The Flops-part on the other hand should be supported by all scheduling-servers that "talks" to v5-clients, and in BOINC-client v5.2.6 and later. v4.46 and later also reports flops-count back to scheduling-server, but due to a bug the client does not "understand" the info from science-application so in reality you needs v5.2.6. Seti_Enhanced that expects v4.11 is the last beta before release, is "counting flops", stores intermediate info about flops-count in state.sah til result finished, and report just before finish by using: #define boinc_fpops_cumulative(x) boinc_ops_cumulative(x,0) boinc_fpops_cumulative((double)(analysis_state.FLOP_counter*LOAD_STORE_ADJUSTMENT)); boinc_finish(0); LOAD_STORE_ADJUSTMENT is a constant, there's still a little tweaking left to do with this constant to get comparable claimed credit from flops-counting as with using benchmark*cpu-time. 2) Would it be respected by the community? I.e. if we were to do the extra work to implement it instead of redundancy, is it the best method? Or should we spend that time on something else? There will always be some that will try to cheat, regardless of what method you're using. But, while using BOINC-benchmark litterally gives claimed credits "all other the place", the flops-counting in Seti_Enhanced reveals very little variation between computers. Atleast in Seti_Enhanced, it's not exactly the same flops-count across computers, since if stop and re-start, the application needs to redo the baseline smoothing and some small amount of the work will be redone, and therefore different flops-count. Still, it's much smaller than variation based on BOINC-benchmark. Anyway, after Seti_Enhanced is "soon" released, users will quickly detect they'll in majority of instances gets within 0.1% of their own claimed credit, so would not expect many complaints about this... As for redundancy, if you don't need it for validating the science, it's possible you can get away without using redundancy if you're "counting flops" and same-type wu gives maybe 5% variation or so in claimed credits, meaning can easily add some checks in database afterwards to catch cheaters... But, if decides to use redundancy, remember: a; Since BOINC-benchmark is "all over the place", and some computers is always claiming very low or possibly also zero, you need min_quorum = 3. b; By "counting flops" on the other hand, claimed credits will be much closer to eachother, so can get away with using min_quorum = 2. c; Since many users is very impatient after getting their credits, if deadline is more than a couple days must also very likely use target_nresults = min_quorum + 1, to keep the complains to a minimum... |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
See here Just followed the above link. So if 'flop counting' can be implemented that easliy (include brief benchmark in application + one call to boinc_ops_per_cpu_second), I can't really think of any reason not to go for it - particularly if the alternative were wasting 2/3 of the available computer resources. |
stephan_t Send message Joined: 20 Oct 05 Posts: 129 Credit: 35,464 RAC: 0 |
See here I second that - I would write more here but Bill Michael's post pretty much nailed it on the head already. Team CFVault.com http://www.cfvault.com |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Jack, Assuming you have time. I wrote up a rather long proposal that as part of it does an analysis of the whole credit mess. A core component is that it does require flop counting anyway, so that is one of the reasons it is not on my front burner. There is a little of the explanation of the participant's perspective. However, to summarize the thesis: I do work for the project. As recognition I am granted an award. If that award is capricious, or seems to be unfair, the project has just, ahem, shown that they do not value my contribution because the award itself has no value. It is similar to me asking you to work for me for the promise of $500 bill at the end of the work. You do the work and I give you a bill from my monopoly set. I have met the letter of the promise, but invalidated the spirit. At the moment, Rosetta@Home is VERY vulnerable to an exploit because I can use inflated benchmark numbers (generated by an optimized BOINC Daemon) to increase my award. This is precicely the thing that happened in SETI@Home Classic and is the very thing the current system is supposed to prevent. On projects that do use redundency, you can see a factor of 4 spread between the claims. The more common spread is about 2. And this type of attribute makes it hard to defend the validitiy of the system. ======= All that being said. Redundency is the wrong choice for this project. Op counting would be the way to go. Long term, I think that we are going to have to implement something like my proposal because the cheaters will figure out how to inflate the numbers. And then they will start to do that too. Again, the projects with redundency can use the averaging techniques to mitigate. ==== Edit Clarity fix - word added Disclaimer: I run the Einstein@Home optimized application and get an average of 20 CS more per work unit than claimed. I run optimized SETI@Home application and get 5-10 CS more per work unit than claimed. Standard BOINC clients are used. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
One more point. If we cannot get something as simple as the credit score right, how can we seriously expect people to take our results as having any validity. If you want I have a whole nother tirade about that topic with regard to the BOINC System ... sufice to say that you would not want me on the peer review of a paper resulting from data derived from BOINC. Answering questions about the random number generator would be the LEAST of your worries ... :) |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Just followed the above link. So if 'flop counting' can be implemented that easliy (include brief benchmark in application + one call to boinc_ops_per_cpu_second), I can't really think of any reason not to go for it - particularly if the alternative were wasting 2/3 of the available computer resources. Using boinc_ops_per_cpu_second decreases the huge variation due to BOINC-benchmark, but as BOINC alpha has shown, even re-running the exact same wu on the same computer can give over 30% variation in cpu-time, and therefore also in claimed credit. Using boinc_ops_cumulative on the other hand does not depend on cpu-time, and also works on win9x that doesn't know anything about cpu-time, and even works on computers that's got too hot or switched to battery-mode and throttles cpu-speed. Anyway, a couple examples from Seti_Enhanced with v4.09, please note flops-counting in this version is low so in reality should give 3x-4x higher claimed credits: Example 1: a; 62.3110518089236 (optimized linux, p4-ht) b; 500.593792491799 (p3, xp, BOINC-benchmark) c; 62.3101685045521 (p4-ht, xp, no re-starts) d; 62.3234750575174 (p4-ht, xp, 19 re-starts) Example 2: a; 62.0297723147535 (amdxp, win2k) b; 136.902237957863 (amdxp, linux, BOINC-benchmark) c; 62.3062227720868 (optimized linux, p4-ht) Example 3: a; 265.140543523198 (p-M, xp, BOINC-benchmark) b; 62.3145864842569 (optimized linux, p4-ht) c; 389.835561313532 (xeon-ht, win2k, BOINC-benchmark #1 gives a variation in claimed credit when used flops-counting of 0.0214%, largely due to 19 re-starts. An optimized linux-application is only 0.0014% away from the standard application. #2 gives 0.4457% variation, this is higher but still much better than any other method. Now, all 3 examples has atleast one claimed credit using flops-counting, within 1% of the other wu. BOINC-benchmark on the other hand variates from 136.9 CS to 500.6 CS, this is 3.66x Let's also show one example from BOINC alpha: Run-1: 11658.78 s - 22.36 CS. Run-2: 11609.56 s - 22.26 CS. Run-3: 12156.45 s - 29.04 CS. Run-4: 11727.23 s - 28.01 CS. Run-5: 11779.64 s - 28.14 CS. Here it's 4.7% variation in cpu-time, and 30.5% variation in claimed credit due to BOINC-benchmark. Looking on other alpha-computers, there's multiple examples there cpu-time has over 30% variation on the same wu. To sum up: a; BOINC-benchmark can even on same computer give over 30% variation, and across computers can give over 3.6x variation. b; There is over 30% variation in cpu-times in BOINC-alpha when same computer has re-run the exact same wu, this also means using boinc_ops_per_cpu_second will also give over 30% variation in claimed credits. c; Using boinc_ops_cumulative on the other hand in Seti_Enhanced seems to give 1% variation in claimed credit, and in majority of instances gives less than 0.1% variation in claimed credits across computers. This even if one of the users is running an optimized science-application. |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
Jack, Well said Paul, and my sentiments exactly. I'm in it for the science and take pride in contributing...that is my "reward". That said, credits are a bit like eating the proverbial Lays potato chip, you're just not satisfied with one. Anyway I think DC contributors ought to "reward" the projects that do things right by contributing more resource share. I propose that if Rosetta takes the extra step toward flop counting and mitigates the cheating aspect, I will increase my resource share to Rosetta...that is all I can do. (added:) Of course, if every project in which I participate adopts the same course of action, then my proposal...shall we say... "flops". |
ecafkid Send message Joined: 5 Oct 05 Posts: 40 Credit: 15,177,319 RAC: 0 |
SO all of this talk about CREDITS. Can I spend my credits anywhere ? Uh NO!!! Does it ad to my Finacial NEtworth ? Uh I don't think so. What do I get out of the project ? Satisfaction knowing that my spare CPU cylces aren't going to waste and are going towards a cause I find interesting and can help people I both know and don't know by finding a cure for diseases that are impacting someone in most of our lives. We could always go back to the days of one WU = 1 Credit. The only thing I look at credits for is to make sure my machine is crunching regularly. I do like the Graphics I think Jack and the team have done a great job in a short period of time to make this a great project. Flops, Drystones, Whetstone or any other method just doesn't matter to me much. The science is most important. Letting David and his team refine the algorythms to better predict folds is important. Redundancy if neede for the science ok. I am sure someone will take this personal and I will be flamed for it. OK if you must. I just hope we can keep the project going forward and crunch as many WU's are our CPU's let us and cures can be found. I look at this no different than donating to a charity. Hey maybe the government will give us a tax break for donating our cpu cycles. |
Honza Send message Joined: 18 Sep 05 Posts: 48 Credit: 173,517 RAC: 0 |
come on, guys - wasn't the thread originally about credit but a promised code realease? Or is the credit system restraint to release it? I understand that redundancy is conntected with the issue as there might be a problem of validity. For me, it's not about credit...rather as Jack pointed out "the goal is to have as many productive cpu cycles as possible.". Code release and it's optimalization is the way to make CPU more productive. Credit doesn't make CPU more productive...but may "motivate" some to migrate from one project to another. This is, IMO, not a best way to attract people. Well optimalized application, graphics (which Rosetta has one of the best), active science team members on forum etc. - this is what charms many users. I mean - [some] credit is on every project but some provides more... |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
come on, guys - wasn't the thread originally about credit but a promised code realease? Or is the credit system restraint to release it? So many threads start with a topic (pick one) and somehow discussion frequently gets -- eventually -- to credits and cheating. Why do you suppose that is? I conclude because it's on everyone's mind and it's important to them. The topic of credits and leveling the playing field is NOT going away no matter how much we extol the virtues of the science. |
Plum Ugly Send message Joined: 3 Nov 05 Posts: 24 Credit: 2,005,763 RAC: 0 |
I for one would like it to be released to those that can help the rosetta progam.But not into the general public.If they want the code and can help, then they could ask for it. I believe that would be the simplest way. I put a lot of systems here and I want them to do the best they can.Fine tuning the program to do it's best is of great importance to me.Once the program smooths out some or I learn the best way to run it I'll be adding many more.Boinc has been a very strange experiance after running fad.Most of the rosetta science is beyond my comprehension BUT one thing I do understand is the more points I put out means that rosetta is getting more work done.And as far as redundancy goes,I'll leave that to ya'll to decide.If thats whats needed then whip it on us. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
SO all of this talk about CREDITS. Can I spend my credits anywhere ? Uh NO!!! This is all correct. Credit does not have any actual market value. Yet it is part of the currency of the DC community. It is the metric by which contributions are measured. Groups have various drives to collect donations. They all have some sort of reward for various levels of donation. Why? It works, thats why. Not only can I say that I have the satisfaction of knowing I did something good. I also can measure how much. Anyway, you are not motivated by it, but, having it awarded to you does not hurt you. Not having it would de-motivate some people from contributing to a project. Since we are trying to get everyone we can, well, you are altruistic and do not need motivation. *I* on the other hand, like to know how much I am doing daily. And so, I like my credit award. But, I also want it to be Fairly earned by me and everyone else. ======= Ingleside, Thanks for that summarization. Nothing But Idle Time, I agree ... but, I can't increase Rosetta@Home much more than the 30% they get now ... |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Although the idea of Flop counting sounds like a very cood idea, it is flawed by the fact BOINC code is open source and therefore anyone capable can re-write that specific code to circumvent it to give a different flop count (or has this been thought of, other than redundency criteria?) As for the Science apps going open source, i'm still not sure how a badly coded client will effect the science. Is it possible to A) Digitaly Sign (or similar) the BOINC clients so only specificaly signed versions can be used ? [may cause trouble with cross projects?] B) Same for the Science app, although allow unsigned clients for testing but not adding them to the stats 'credits' ? If only B is feasable then can the counting not be taken into the science app itself, and hence having an optimised benchmark inside the optimised app, all which are digitally signed and therefore give an equal Credit per same WU across the platforms... Team mauisun.org |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,120 RAC: 21 |
Although the idea of Flop counting sounds like a very cood idea, it is flawed by the fact BOINC code is open source and therefore anyone capable can re-write that specific code to circumvent it to give a different flop count (or has this been thought of, other than redundency criteria?) The flops-counting takes place entirely in the science app, not in the BOINC Client. However, at some point the numbers have to pass _through_ the client to get back to the project, and it is at that point that any "reasonably simple" cheating would be possible. However, even without redundancy, if "all the WUs in this block give 27.35 credits, +/- 0.5", then it becomes real easy to spot the guy that returned 100 of them, all asking for 150 credits each. EDIT:: "B" is certainly doable, "A" probably not. |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
I certainly understand the importance of accurate credit to the boinc community. What I'm trying to understand is the most effective way for us to use our development time so that we don't waste it implementing a system that still allows cheaters, or is perceived as not giving fair rewards. It would also be optimal from our point of view if two other conditions could be met: (1) Cheating would be impossible even with open source, and (2) redundancy was unnecessary. Here are my thoughts on the two main proposals that I have seen in this thread. Flop Counting Unfortunately, it doesn't sound like flop counting would solve cheating in our app without redundancy. Work Units that differ only in their random number seed can have significantly different running times because of the nature of our algorithm. So we wouldn't be in the situation where "all the WUs in this block give 27.35 credits, +/- 0.5". The only way to be able to compare flop counts would to be have 100% identical jobs, so we would be back to redundancy. Also, it sounds like flop-counting would be just as prone to code manipulation, and so wouldn't make people any more comfortable about releasing the code. To summarize my understanding: Flop counting provides more accurate feedback on how much work a host did. But because our app performs variable amounts of work even for very similar work units, we might not be able to catch cheaters simply by looking for outliers in small quora. This would mean that we are back to implementing true redundancy. If we do end up using redundancy, it sounds like it might worth implementing flop counting with it. Pseudo-Redundancy @Hermann Brunner: Jack, I don't understand why you say that pseudo-redundancy (as described in your post; you seem to refer to the first of my two 'pseudo-redundancy' proposals made in this thread) would require 'unlimited' development time. You are right that it wouldn't take unlimited development time. But it would take significantly more time than turning on redundancy. And if we wanted to include validation (which we get for free with redundancy) that will require yet more coding. Plus, we need to make sure that our implementation is solid enough to satisfy the boinc community. Or at least as solid as what we get from boinc with redundancy. That said, I still thing pseudo-redundancy has a lot going for it. For all of the folks worrying about accuracy, the quorum sizes would be in the thousands. The issues about being paired with a Mac user, and the like would disappear. You wouldn't be getting "average" credit anymore than with any other quorum system. A credit amount would be assigned based on the median, but the rate at which you gained credit would depend on the precise speed of your boxes relative to this average. As another benefit, because this would require us to code up methods to examine the distributions of cpu performance overall for each Work Unit, it would be easier to deny credit to the outliers. My main concern, other than the development time, is convincing the community that this is a satisfactory system. I am partly basing that on what I see in this thread, where it seems to have gained very little traction. I would be very interested to hear more critique of this method, becuase it seems like the best solution to me. |
Message boards :
Number crunching :
code release and redundancy
©2024 University of Washington
https://www.bakerlab.org