Message boards : Number crunching : Unintended consequences of the new credit system?
Author | Message |
---|---|
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
In the old thread on this topic, there was a long and complicated explanation. Thanks, but I still couldn't figure it out, and whatever rules you prepare and no matter how carefully you prepare them, people who want to cheat will make the effort to figure out those rules so they can break them. I see it as a natural extension of Godel's Incompleteness Theorem. In that thread it said not to reply there, but linked to another thread where it appears that no replies are possible. Ancient thread anyway. Let me be clear that I mostly don't care at this point, and I am not interested in trying to rig the system. However, insofar as I think I'm supposed to try to earn points, there is a problem (Houston?). Most of the work units take a long time to run, whereas most of my computers are not running that long. This means that old work units queue up, and according to another post in one of those threads, late work units don't get credit. Therefore, the obvious algorithm to attempt to earn points calls for aborting older work units before they start and waste computation time on my computer. It only makes sense to start the work units with the longest deadlines so that they have the highest chance of being completed in time so they can earn points. Problem: The work units and their data were already downloaded to my computers, so aborting them because fresher work units are queued behind them means that the downloads were wasted. In some other thread I read that the project had bandwidth problems. I think there are a couple of possible solutions, but the simplest one is to get rid of deadlines and give credit for the honest efforts even for people running slower computers or who don't run their computers long enough at a time. (This is also related to checkpointing problems that affect certain work units.) Orthogonal problem is giant work units that cause computers to waste available cycles because memory is not available. Again, credit could be earned but is lost for no fault of the wannabe donor. This is especially annoying because it often appears that other small-memory work units may be available, but BOINC apparently can't figure that out, so it just leaves some of the CPUs idle. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
If your machines are downloading more work than they can complete, you should consider adjusting your network settings for the work buffer that is more appropriate for your environment. It appears you've already reduced the setting for preferred runtime for this host from the 8 hours you had set previously. It appears your other hosts are not set with a runtime preference for longer running workunits, and so they are processing each task for less time. This results in less models per tasks, but greater granularity in completing and returning tasks before deadlines and getting more work. As for aborting tasks, you do what you need to do. But reducing the settings for work buffer size should help avoid getting more work than your machine can process before the deadline, and reduce the bandwidth used by a given host machine. The credit system has proven itself. The improvements over the original BOINC implementation have served the purpose they were intended to. The deadlines are serving the purpose they have as well, and allow tasks to be tagged with deadlines that reflect the timeliness of the results required to be of best use to the project. As for the "huge" workunits, I take it you are talking about memory more than CPU time. It looks like your machines all average about 1GB or memory per CPU. Yes, R@h will run better if more memory is available. The amount of memory a task needs to run is not readily changeable. So, I typically suggest folks, with systems like this, mix in 25-50% resource share with one or more projects that have lower memory requirements. I typically suggest WCG as a similar project objective with much lower memory requirements. This keeps all of your CPUs active, and still provides maximal benefit to R@h. You brought up a number of issues and have 6 hosts running two operating systems, so do let us know what other questions you have. If you can be specific about which host or tasks are involved, that often helps yield more specific responses and suggestions. Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Look. I want to do my bit to help make the world better, but it isn't my full-time job. I don't want to spend lots of time trying to optimize things for your convenience, even though I know that you have to optimize things from your side because you are serving large numbers of clients. To a degree, running BOINC has already complicated my life quite a bit and caused me to try to adjust my computer usage patterns to simplify things. Those complications drove me away from several other projects that I was supporting. The main reason I'm still using Rosetta is because I just don't care much these days. I can definitely say that the main ongoing nuisance is the deadlines. What you describe as "the timeliness of the results required to be of best use to the project" sounds like rationalization to me, though I can imagine the real problem from your side might be storage for pending results. I think it would make more sense for you to fix that problem at your end so we don't have to care about tweaking our machine settings to optimize this and that. I don't know what the deadlines really mean at your end. Maybe you send out the data a couple of times and just give credit to all the work units that are completed before the deadlines? Or maybe you are configured to send the data only once and hold the data in a live store until the deadline passes, at which point you send it to another client. From my perspective, if I don't make the deadline, I don't actually care, but if I'm supposed to try to earn points, then it still seems I should avoid late units and the easiest algorithm to do that is just kill "stale" work units so fresher ones will have the best chance of succeeding. Oh yeah, one more thing. Shortening the work queue would be more appealing if your end could be relied upon. I suppose it's actually relatively rare, but the way memory works the downtime on your side is highly memorable. Small work buffers seem to result in machines frequently running completely out of work, which is even more annoying that the partial losses. Okay, a second thing. Not sure how I feel about having my personal details broadcast that publicly. So many machines might make me sound like a rich target, when the reality is quite different. Some are just multiple boots and most of them are old machines that I'm skilled at keeping alive for special purposes. I could make things look even worse if I let my VMs run BOINC... #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I can only work to help people understand how the system works and how to get what they want. It's not my full-time job either. I'm a volunteer here in my living room. I cannot change server code, upgrade databases, or remove deadlines. There is no "your end" under my control. It seems like you've invested a fair amount of time learning about how BOINC works. The BOINC Manager gives you everything you need to get the results you want. Noone is asking you to spend lots of time at anything. Adding a second, highly reliable project, such as WCG will resolve two of your concerns. One, that you might run out of work. And two that you might have idle CPUs due to memory utilization of active tasks. Once you select and add a project with lower memory requirements, and reduce the size of your work buffer, you're done. The machines will run themselves very effectively with the time they have available. Rosetta Moderator: Mod.Sense |
noderaser Send message Joined: 4 Oct 05 Posts: 16 Credit: 115,564 RAC: 1 |
There are other projects that are less stringent about deadlines; BURP is one example--the deadlines there are to trigger additional replications of tasks to get the rendering done for the artist faster. Everyone gets credit for completing the tasks regardless of whether they are completed after the deadline or not, though that doesn't really help out the artist. But, there is not a constant flow of work available since the project relies on the community to upload sessions to be rendered. CPDN has a few different applications running, but for many of them the deadline is a year--one which I use on some of my hosts that don't run very often. I'm sure there is a project out there that would have deadlines that work for your computing schedule, although I don't know of any place that hosts that information (like WUprop does for computation time, credit awarded, etc) in one place for easy reference. Click here to see My Detailed BOINC Stats |
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,649,459 RAC: 0 |
I don't blame R@H for having deadlines at all, in fact I keep a short work buffer (0.3 days on most boxes) and have WCG setup as a backup project with low priority so it will just kick in if R@H goes down. This gives everyone the best of both worlds, the average TAT for work on my machines is less than a day, and my boxes stay busy (with a fallback to WCG) even if R@H goes down. Quick work turnaround is important to me because I am thinking first and foremost about the researchers using this platform. I work with some high performance computing clusters in my day-job and I can attest first hand that long turn around times for queries/model runs leads directly to slower iteration and slower progress. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
This is entirely in your control. Go to Your Account > Rosetta@home preferences You should have an option for, "Should Rosetta@home show your computers on its web site?" Currently I, and every other public user, can see your computers and work units, so the option must be set to "yes." Switch it to "no" if you would prefer for the data to be hidden from the public but visible to the project scientists. Feel free to choose the option you are comfortable with but the drawback of hiding the data is that other contributors can't help you identify problems if you start getting errors. Though you could always switch it back on at that time. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
There are other projects that are less stringent about deadlines; BURP is one example--the deadlines there are to trigger additional replications of tasks to get the rendering done for the artist faster. Everyone gets credit for completing the tasks regardless of whether they are completed after the deadline or not, though that doesn't really help out the artist. But, there is not a constant flow of work available since the project relies on the community to upload sessions to be rendered. After a couple more flaky weeks and idled computers, I'm counting this as the most constructive comment in the thread... Don't think I've ever tried BURP or CPDN. I'll check them out next time I use the currently idled machine... Maybe I have a job for it tomorrow morning? (I would have left it running yesterday if it hadn't run out of BOINC work.) Then again, mostly I don't care that much anymore. WCG is just more annoying because it's supposed to have some professional management and even corporate support, while Rosetta has always seemed like an amateur operation. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I don't blame R@H for having deadlines at all, in fact I keep a short work buffer (0.3 days on most boxes) and have WCG setup as a backup project with low priority so it will just kick in if R@H goes down. This gives everyone the best of both worlds, the average TAT for work on my machines is less than a day, and my boxes stay busy (with a fallback to WCG) even if R@H goes down. This old reply is still grating on my nerves... Guess I think there's a fundamental misconception between having a dedicated supercomputer at your disposal and having an abundance of low priority cycles. If you can't afford to pay for the priority, then you should not expect a highly responsive system and the deadlines remain counterproductive and annoying to people who are just "playing the game by the rules". The idea of points is apparently to give a concrete metric of accomplishment to the volunteers. At this point my main "success" is that I'm approaching 3.5 million "units" of work done for r@h. My #2 is probably WCG with 1.6 million old units and #3 might be Leiden with less than 0.3 million old units. Maybe my ancient s@h score should be included? Pointless effort, but I was in the top 1% when I left it... I've tried a few other projects over the years, and I think the hassles of "project management" were almost always the reason I dropped that project. Near as I can tell, the main result is that I've mostly lost interest in the game. Minor loss? Maybe, but it also means I've stopped recruiting other people. I can't recommend joining BOINC. If that lack of enthusiasm is becoming widespread... Anyway, today's main annoyance is lots of 2-day deadlines. I think they are just make-work throwaway units that r@h is generating to attempt to prevent more idle time. They create a significant nuisance for people like me who have machines that don't run all the time and who don't even want to wonder about what happens to the work that missed its fake deadlines, especially when it's probably fake work. The two-day units always jump ahead of other units and then lots of units start missing their deadlines. I've already pulled two computers off of BOINC, and maybe I'll start pulling the others. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,621,941 RAC: 9,507 |
Near as I can tell, the main result is that I've mostly lost interest in the game. Minor loss? Maybe, but it also means I've stopped recruiting other people. I can't recommend joining BOINC. If that lack of enthusiasm is becoming widespread... Everyone has his motivations. If you lost your, i'm sorry for you. They create a significant nuisance for people like me who have machines that don't run all the time and who don't even want to wonder about what happens to the work that missed its fake deadlines, especially when it's probably fake work. The two-day units always jump ahead of other units and then lots of units start missing their deadlines. I've already pulled two computers off of BOINC, and maybe I'll start pulling the others. Again, it's your choice and, despite i'm not agree, you're free. But i think that if admins have these deadlines they have their motivations and that these are NOT fake wus. I hope we look back |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The two day deadline jobs are high priority jobs that are for our public protein structure prediction server, Robetta (http://robetta.bakerlab.org). These mainly consist of CAMEO jobs which are weekly performance benchmarks that get submitted to our server on Friday and then must be finished by Monday. CAMEO (http://cameo3d.org) is a great resource that submits pre-release PDB sequences to registered structure prediction servers and then analyses the results when the PDB structures are finally released (Wednesdays) for a completely blind prediction benchmark. You can think of it as a continuous CASP experiment. CAMEO typically submits 20 targets and we have two servers that must process these in time. One server is the production server and the other is a beta-test server where we test the latest and hopefully greatest methods. What I can do is increase the high priority job deadline to 3 days and increase the standard jobs to a couple weeks which should help a bit. I don't think I can increase the high priority jobs more than 3 days without starting to waste results. But I'll see what I can do. Sorry for any inconvenience. Jobs that are past deadlines should still be granted some credit. If this is not the case, please let me know. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,226,850 RAC: 11,023 |
What I can do is increase the high priority job deadline to 3 days and increase the standard jobs to a couple weeks which should help a bit. I don't think I can increase the high priority jobs more than 3 days without starting to waste results. But I'll see what I can do. Sorry for any inconvenience. Jobs that are past deadlines should still be granted some credit. If this is not the case, please let me know. This would be useful to compensate for the automatic 24-hour delay when "Rosetta Mini for Android is not available for your type of computer" responses come from the server as a larger buffer can be held at our end. 3-days should be considered the minimum deadline outside of CASP tasks. If I see you've implemented 3-day deadlines I'll adjust my buffer accordingly. Thanks. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
The two day deadline jobs are high priority jobs that are for our public protein structure prediction server, Robetta (http://robetta.bakerlab.org). These mainly consist of CAMEO jobs which are weekly performance benchmarks that get submitted to our server on Friday and then must be finished by Monday. CAMEO (http://cameo3d.org) is a great resource that submits pre-release PDB sequences to registered structure prediction servers and then analyses the results when the PDB structures are finally released (Wednesdays) for a completely blind prediction benchmark. You can think of it as a continuous CASP experiment. CAMEO typically submits 20 targets and we have two servers that must process these in time. One server is the production server and the other is a beta-test server where we test the latest and hopefully greatest methods. Thanks for the clarification and the efforts to improve the situation. I think you are sincere and all that jazz. I still think there should be some algorithmic solution that would make the problem disappear, at least from the perspective of the volunteers. However, it also needs to be something that doesn't create any headaches from your side. Can you perhaps tweak the credit system without breaking it? One idea would be to make deadlines less relevant to the users? Just create a more solid guarantee of some minimum score for completing the work, even if it's too late to be of use for your purposes. Then you can give bonus points for beating the deadlines? At least it looks like there are many cases when completed units get zero credit for no reason that I can figure out except that they apparently missed their deadline... (Now I'm wondering if there's a bug in the credit system?) I think the more comprehensive solution is just not feasible because it involves the uncertain future... That would require considering the usage pattern of each computer. As it applies to these high-priority "rb" jobs, the BOINC client shouldn't even download one of them unless the usage pattern indicated that the computer would run for at least 3 hours over the next two days (and considering the amount of queued work, too). (Most of my computers need 3 hours to finish one "rb" unit.) Too much hassle worrying about the future, but if there's a guarantee of partial credit then at least there would be less incentive to waste your bandwidth by aborting downloaded ("Ready to start") units when we know that they have no chance of being completed before their deadline is passed. Then again, if they really are high-priority tasks, then I have to say that BOINC is not the best tool. Yeah, I know it's the poor craftsman who blames his tools, but it's also the poor craftsman who doesn't recognize the differences among tools and who doesn't want to use the best tool for each purpose. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Overdue jobs should eventually get some credit. If this is not the case please let me know with some examples and I'll fix it. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,617,765 RAC: 11,361 |
Overdue jobs do get credit - might take a day or so. E.g. https://boinc.bakerlab.org/rosetta/result.php?resultid=898173929 |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
After a couple of weeks of watching the effects of the tweaks, I was thinking of offering some observations and further suggestions for possible improvements. Around that time the DNS problems began. *sigh* Doesn't seem like Rosetta@home is worth much thought or effort, does it? Anyway, I hope the DNS problems get fixed at some point (though it's already been several days for a correction task that should take a few minutes). My older concern with the wasted bandwidth is probably something that should be BOINC-level problem. If the client doesn't know anything about the history of the machines performance (by keeping track of the usage history), then obviously it can't help the projects schedule their work on a feasible basis. Downloading data that gets tossed is just the natural result. Apparently LOTS of data in the case of Rosetta@home. P.S. I got in by hacking my hosts file. Not recommended, and anyway I don't know all the servers required to make it work properly. I'm going to unhack it now, and I won't be back until the DNS gets fixed, whenever that is. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,226,850 RAC: 11,023 |
After a couple of weeks of watching the effects of the tweaks, I was thinking of offering some observations and further suggestions for possible improvements. Around that time the DNS problems began. Looking at this thread, you said several things: Tasks returned after the deadline don't get credit. That turned out untrue. Tasks with short deadlines might just be issued for the hell of it. That turned out untrue. You don't like your computers being public. This turned out to be a user setting you still haven't changed. Now you're asking about usage history by machine. From the Boinc manager, select Rosetta and click "Your Computers" down the side. Select "Computer ID" for the machine you want to examine and go to the bottom of the page. An example of what shows on one of my machines below: % of time BOINC client is running 88.5404 % So this information is held for each machine. I may be mistaken, but I think this availability info is used when Boinc works out if you can complete work by deadlines or not (may be wrong as it's Rosetta info rather than Boinc Manager info). Aside from this, given we're now assured the short deadline tasks are appropriately important, user settings for our buffers should be adjusted accordingly. If you don't adjust them, you can hardly complain about missing deadlines or short deadline work getting prioritised over long deadline work. It's planning to fail, otherwise. And no downloaded work that needs to be aborted either. On the occasions work isn't available, that's what backup projects are for. Again, if you don't have one lined up, you're effectively accepting you don't want to run anything if your preferred project is down - something which we know we can't predict. Make these changes once and to a great extent you won't have to revisit them - no need to micro-manage on a regular basis. This recent DNS outage was highly annoying to me as Rosetta is my preferred project, but I never ran out of work, even on unattended machines. Just none of my preferred work. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
There's an old saying "If you have nothing to say, then perhaps you should say nothing." At this point I think the first "nothing" needs to be modified, perhaps with "constructive" or even "intelligent". Time for the old joke about "I can't challenge you to a battle of wits because it's against my principles to attack an unarmed opponent." Now if you [not just the person to whom this reply is ostensibly addressed] actually have a constructive and intelligent solution to conserve the bandwidth of the project, then I strongly encourage you to share it with all of us. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Darrell Send message Joined: 28 Sep 06 Posts: 25 Credit: 51,934,631 RAC: 0 |
@ shanen:
There is a group that addresses issues at the BOINC Manager level. They can be contacted at: To subscribe or unsubscribe via the World Wide Web, visit http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha or, via email, send a message with subject or body 'help' to boinc_alpha-request@ssl.berkeley.edu Best wishes. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
@ shanen: Thanks for the reference and I may pursue it... Not sure because sometimes I also wonder if some of this is my own fault. Back in the pre-BOINC seti@home days I remember some 'sharp' discussions in the newsgroups where IIRC I was on the side of advocating for the approach of BOINC... At this point I basically feel like the people running the projects should be the people dealing with the BOINC people. In addition, I just read a comment from Mod.Sense [who seems to be one of those people] to the effect that Rosetta@home isn't much worried about the bandwidth these days. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Message boards :
Number crunching :
Unintended consequences of the new credit system?
©2024 University of Washington
https://www.bakerlab.org