Message boards : Number crunching : aborting work units
Author | Message |
---|---|
Stephen Send message Joined: 26 Apr 08 Posts: 32 Credit: 429,286 RAC: 0 |
is it safe to abort work units on the older applications in order to focus time on the newer application versions? I mean, does it affect your results negatively by aborting work units in bulk just to ensure we're using the most up-to-date version? |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
is it safe to abort work units on the older applications in order to focus time on the newer application versions? Yes! Definitely, if you're running stuff with 1.46 or older please abort it. It'll just get sent out again anyway. Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There are presently 4 versions of Rosetta that are "current". So, you want to check the list and make sure you don't abort a list of tasks just to download more of the same. Rosetta Moderator: Mod.Sense |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,337 RAC: 1 |
Is their a quick way to abort and resend tasks? my host was detached from this project while I had tasks still crunching. I apologize for my mistake. Have a crunching good day!! |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Is their a quick way to abort and resend tasks?
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
No speedy, when you detached, the tasks are aborted. And even if that didn't come through, the project servers won't resend the tasks. You'll get new ones and the old ones will be reissued, either due to the receipt of the abort, or due to reaching the deadline with no result. Rosetta Moderator: Mod.Sense |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,337 RAC: 1 |
Thank you for your responses Murasaki & Mod.Sense. I thought this was the case, I just wanted to make sure. Have a crunching good day!! |
Cesium_133* Send message Joined: 1 Dec 08 Posts: 28 Credit: 225,332 RAC: 0 |
A couple of questions that have not been addressed anywhere in my research, to which answers would be helpful: 1A. I am able to get a lot of WU's by suspending the other projects I run (I am running 4... Rosetta, Climate Prediction, Hydrogen, and AI at 85%, 7.5%, 3.75%, and 3.75% devoted time respectively). I do this sometimes for each project in order; they'll suspend, and the one left running phones home for new WU's. Is this encouragable, acceptable, tolerable, neutral, or malevolent conduct? My intentions are good... it's just that... 1B. I just found I would run short of time on some Rosetta WU's, so I aborted them before they began running or went past due during a computation. What happens to those WU's? Please tell me they don't get 86'ed and/or cause the project to suffer... are they "recycled", re-distributed, recomputed, what? 1C. If I let the scheduler send me tasks, rather than baiting it, can I be confident that I will always have work if there exists work to be done? 2. Is the scheduler programming, the code that delivers and parcels out WU's, hopelessly obsolete or actually functional? Seems to be some disagreement and nescience on that... 3. I don't merit or want credit for aborted tasks... I trust I don't get any? 4. Why, exactly, are there hard and fast deadlines for WU's? Is there some inflexible point past which BOINC will not allow a WU to remain in the wild? Thanks for your help. I request an expert, or his designee, help me out :) Best, John :D The lovely lady you see isn't I, but Hayley Westenra, a classical crossover singer from Christchurch, NZ. There is no known voice as hers. Check her out- she's seraphic. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,217,610 RAC: 1,154 |
A couple of questions that have not been addressed anywhere in my research, to which answers would be helpful: You are becoming a micro-manager of Boinc and it will work out but can also cause more problems, like missed deadlines. Boinc is designed to be left alone. 1B. I just found I would run short of time on some Rosetta WU's, so I aborted them before they began running or went past due during a computation. What happens to those WU's? Please tell me they don't get 86'ed and/or cause the project to suffer... are they "recycled", re-distributed, recomputed, what? They all get recycled to other people and you just lose some of the units you can get from that project. When you abort some the total number of units you can get, from that project, goes down for a short time and then as you return units on time it goes back up again. 1C. If I let the scheduler send me tasks, rather than baiting it, can I be confident that I will always have work if there exists work to be done? For the most part yes, it is designed to just do its thing with little to no intervention on our parts. It doesn't always work that way but it is supposed to. One thing that will happen is your 85% etc settings will be followed over time not in the short term. One project will have work while another may not at the exact moment Boinc asks for it so another project will give you a bit more. Over time it will even out. 2. Is the scheduler programming, the code that delivers and parcels out WU's, hopelessly obsolete or actually functional? Seems to be some disagreement and nescience on that... No it works okay for the most part, most people never touch it and it works just fine. 3. I don't merit or want credit for aborted tasks... I trust I don't get any? No you do not 4. Why, exactly, are there hard and fast deadlines for WU's? Is there some inflexible point past which BOINC will not allow a WU to remain in the wild? Because each project handles their own data in their own way, some feel the need for shorter deadlines, some are okay with longer ones. Kind of depends on what their contract for whoever is paying them to do the work says. Some like Malaria try to use the data faster so they can get the drugs out quicker, some like Seti are not as concerned because the data has taken millions of years to get here anyway! Thanks for your help. I request an expert, or his designee, help me out :) Best, John :D I am neither, just a cruncher like yourself. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
A couple of questions that have not been addressed anywhere in my research, to which answers would be helpful: If you would state your intentions it might help people comment on alternative approaches to achieve your goals. If the net result is that your machine contacts the project fewer times during the week, it is a net good thing for the project servers. There is less overhead supporting your machine. On the other hand, you are one of tens of thousands of users. Don't sweat the small stuff. 1B. I just found I would run short of time on some Rosetta WU's, so I aborted them before they began running or went past due during a computation. What happens to those WU's? Please tell me they don't get 86'ed and/or cause the project to suffer... are they "recycled", re-distributed, recomputed, what? Perhaps running close to deadlines is due to point 1a above. They do get reissued when you abort them, or they pass their deadline. Remember the name of the game here is models completed. The specific models that will be done as any specific task are not special in any way, they just add to the total. I mean there is no gap in the data. If your task with model 1, 2 and 3 is not completed, the project might create a new task which will work on models 4, 5 and 6 and so long as 3 are completed, it all works out about the same. 1C. If I let the scheduler send me tasks, rather than baiting it, can I be confident that I will always have work if there exists work to be done? You will always have work. But not always from all of your projects. Eventually BOINC is going to figure out that you are getting behind on your climate model, and it is going to devote the time it takes to try and complete that before it's deadline. And so it may plan to run climate for several days (or until the estimated completion time is sufficiently reduced to convince the BOINC manager that it will be completed in time). 2. Is the scheduler programming, the code that delivers and parcels out WU's, hopelessly obsolete or actually functional? Seems to be some disagreement and nescience on that... The scheduler works alright. The recent problems are due to the code that runs on the home computers. It doesn't always request work when it should, and sometimes request much more work then it needs (misestimates how much work to request). 3. I don't merit or want credit for aborted tasks... I trust I don't get any? correct. 4. Why, exactly, are there hard and fast deadlines for WU's? Is there some inflexible point past which BOINC will not allow a WU to remain in the wild? The scientific method is playing itself out on many simultaneous projects within BakerLab. Picture a single graduate student writing their thesis. They start with a hypothesis... "if we modify the energy function in this way, it should better direct the program to solving proteins with zinc" for example. Then you devise experiments to try and confirm or refute your hypothesis... a batch or a number of batches of work units. The next step is key to your question. Analyze the results of your experiments. When will you start this phase of your research? The deadlines give a way to help your BOINC Manager schedule the work from various projects. And if you graphed it out, you would find that most tasks that go past 10 days without being reported back are never actually completed. The host has stopped running BOINC. The host has lost the work in progress. The host has lost it's internet connection. Whatever it is, the odds of seeing them back are very low. So the deadlines basically give the researchers a stake in the sand where they can feel confident they've got about all the data they are going to get for this go around, and they can begin their detailed analysis of the results. Now picture the above application of scientific method being done by several dozen scientists at the same time. This is the overview of the project you are helping with. Many subteams of researchers, exploring various ideas, and techniques. Studying what works well and what does not. Over and over again. Thanks for your help. I request an expert, or his designee, help me out :) Best, John :D mikey's comment applies to me as well: I am neither, just a cruncher like yourself. ...but I've been around a while. Rosetta Moderator: Mod.Sense |
Cesium_133* Send message Joined: 1 Dec 08 Posts: 28 Credit: 225,332 RAC: 0 |
If you would state your intentions it might help people comment on alternative approaches to achieve your goals. My intentions are to get as many WU's assigned to my machine as it can process by the required deadline of each WU. Also, I want such WU's to be compliant in quantity and required total crunching time over a given, long period (>= a matter of a few months, at least) with my set %ages for Rosetta et al. Nothing fishy or abusive; just good use of my resources. I know I would finish more discrete WU's with Hydrogen than Rosetta, but I want my PC to run Rosetta more, and it's doing so... thus, good there... If the net result is that your machine contacts the project fewer times during the week, it is a net good thing for the project servers. Granted, though I have yet to figure out the exact algorithm BOINC uses for contacting home -for credit-. Every WU is allowing BOINC to upload its finished data forthwith on completion, as it should, but if a WU finishes closer than a certain amount of time before the deadline, BOINC will report it alone for credit. WU's can be running high-priority before this unknown time, though, finish up and report, and queue up on my machine to be either reported manually or called in per the 24-hour default time. Eventually BOINC is going to figure out that you are getting behind on your climate model, and it is going to devote the time it takes to try and complete that before its deadline. As it should, yes... I have 3 WU's on climate... one due April 2010, the others November 2011. Must be big ones. Over time, I would expect to be assigned WU's finishable by their respective deadlines per my allocated resource time. ...(t)he next step is key to your question. Analyze the results of your experiments. When will you start this phase of your research? Understood. What I'm getting at is the hour/minute/second precision of the deadlines. Why not just have a given day at 11:59:59 PM all the time? Do the individual projects work off a formula which relies on when they're compiled, or sent into the wild? It just seems illogical to have some WU's with a deadline 2 minutes later than some others, unless it's all automated. Even then, why not use the Easy Button and go to the end of the day? Maybe it's too good an idea :) I sit around thinking of this stuff, btw... I could use a gf... lol... such as the one I have in my pic and sig... :D The lovely lady you see isn't I, but Hayley Westenra, a classical crossover singer from Christchurch, NZ. There is no known voice as hers. Check her out- she's seraphic. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Understood. What I'm getting at is the hour/minute/second precision of the deadlines. Why not just have a given day at 11:59:59 PM all the time? There are a number of reasons that I can think of. I have listed a few of them below. If all WUs are due back at a certain time, then all incomplete WUs will re-enter the job queue at the same time. On a good day the servers should be able to cope with the sudden jump in workload, but on a bad day (perhaps after the issuing of a bugged Rosetta version) a large number of incomplete tasks may need to be handled at once. Most reissued tasks would be sent out soon after the deadline, meaning that crunchers operating in that timezone are more likely to pick up a re-issued task than users in other timezones. If there is a batch of WUs where an error is causing them to run slowly, then you will be issuing a higher proportion of bad WUs to the crunchers operating in that timezone. Hardly an encouragement for them to continue crunching. If a batch of WUs is not running properly due to an error, the project team may not be able to see a pattern until a few WUs have timed out. Under the current system, the project team can monitor the situation in real time and cancel or amend a batch if a particular group is timing out. Under the fixed timescale method the project team won't know about the problem until the day after the failed batch passes the deadline (assuming that they set the deadline as midnight in their timezone as in your example). That means most if not all of the failed batch will have been reissued before the project team has a chance to pull the plug. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
BOINC "phones home" for credit at least once every 24 hours if tasks have been completed. More often if it needs work, at which point it will also report completed work. So, it can SEEM random ... The reandomness of the deadlines is a good thing in that it means that we all don't try to report at the same time. Traffic jam does not begin to describe the situation. You are correct though in the thought that sometimes BOINC is a little to anal in its handling of deadlines and the client can panic and do inappropriate things in many more cases than we would like to imagine ... I have tried to get the developers to rethink some of the rules to avoid these situations but to no avail. Interestingly enough had they done so, the new problems seen at SaH recently would not be issues ... restarting suspended tasks not happening would not be an issue if BOINC almost never suspended tasks ... :) |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
If you would state your intentions it might help people comment on alternative approaches to achieve your goals. The best thing to do is to determine your long term project percentages, set them up by adjusting the resource share of your chosen projects, set Boinc to keep about three to four days work ahead, and just leave it alone. The Boinc scheduler is a fairly complex beast, it has to be because of the various projects it deals with. Different work unit durations, different deadlines, things like CPDN with workunits that run for months at a time, etc. Keeping this in mind, it's really designed to be run in "set and forget" mode. It'll take a week or two to sort everything out, but if you leave it alone, it will respect your resource share choices, and more importantly, leaving it alone and not micromanaging reduces the risk of missed deadlines. Also, unless you have a very strong reason to do so (intermittent connection, e.g. dial up), there is no benefit whatsoever is maintaining a large work buffer. I run with a three day buffer, and simply don't have any problems. |
Message boards :
Number crunching :
aborting work units
©2024 University of Washington
https://www.bakerlab.org