Message boards : Number crunching : no work units
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
_whistle_whistle_whistle_ Always look on the bright sight of life... _whistle_whistle_whistle_ |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
Like an ostrich? Pollyanna? Through rose colored glasses? Just wondering ... :-) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
***CAUTION*** moderation of off-topic or confrontational posts ahead. Please take the advice of my last post and drop the mutual criticism. It serves no constructive purpose to continue the current divergence of the topic in this thread. Rosetta Moderator: Mod.Sense |
Bikermatt Send message Joined: 12 Feb 10 Posts: 20 Credit: 10,552,445 RAC: 0 |
Um, I think there is a really good point being made here. If any other project I crunch on goes down for even five minuets it seems like someone who works on the project posts what happened and what they are doing about it. Rosetta has gone down several times since I started crunching here and I have never seen a post from anyone that works on the project about what is going on. Am I missing a thread somewhere? |
Mark Rush Send message Joined: 6 Oct 05 Posts: 13 Credit: 52,623,408 RAC: 4,456 |
I truly do NOT understand why anyone gets angry when a BOINC project goes down. I "love" Rosetta. It is by far my favorite of all the BOINC projects. But if it goes down, so what? The beauty of BOINC is that I can run several projects simultaneously. When Rosetta fails, I simply crunch more WUs from several other (almost-as) worthy projects. Meanwhile I figure my BOINC program is acquiring a large Rosetta debt so when Rosetta is back at 100%, I'll get a boatload of Rosetta WUs. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Hey guys - I think that it's time to cool it a little bit - I'm sure the folks up in Washington are doing their best and unless you are an absolute credit whore what is this slow down really costing you? When you stop crunching rosetta, you stop using as much electricity. I don't see your point. No WUs left? Worried about electricity? Shut down your computer(s). Or better yet, attach POEM which is similar to rosetta so then no CPU cycles would be wasted! btw, I still have plenty of WUs in queue. |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
When you stop crunching rosetta, you stop using as much electricity. I don't see your point. Such grand advice! Inane, but still grand. deesy |
Bob Merrill Send message Joined: 3 Oct 09 Posts: 1 Credit: 2,511,208 RAC: 0 |
I haven't gotten any work for 3 days now. Whats going on? I don't see anything on the home page about this. Good thing I have 2 other jobs running. |
Sparky66 Send message Joined: 31 Dec 05 Posts: 2 Credit: 6,948,965 RAC: 118 |
Be careful with the trolling. Mod.Sense has the red pen out. BTW, anyone notice it's difficult to get work units? :P |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
It seems to be getting better. The servers have been keeping me supplied with enough to keep me going. I try and ensure a reserve of 12 work units with a 6 hour run time. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2179 Credit: 41,716,934 RAC: 7,334 |
I suspect the make_work servers need some attention. On the front page there are 1.8m jobs waiting to complete but only 10 ready to send out. Work is coming through but only at about half the rate it needs to to keep people supplied (86k completed last 24h equiv to 4.2m credits when it's usually nearer 10m credits). Sorry! Beg to differ. I checked the Server Status page numerous times (as did others), and all servers were reported by the page to be up and running. Yet, we were receiving no work to process. It can't be no work, just not enough and not at a rate that keeps everyone supplied at all times. That's a different thing. Certainly a problem that needs to be solved, I agree. There appears to be no status of "Running at xx% efficiency" - that's more like the problem I'm seeing. Are you losing data or are important results you absolutely must have right now being delayed? I'm going to guess the answer to that one is no. I'm pretty sure I've had no downtime at all, amazingly. That's partly because the shortage in my buffer has been filled by some WCG tasks but also because most of my work calls have been to fill a buffer of tasks, not actual work to crunch. I even set WCG to no new tasks for a few days and still haven't run out. I know that may just be dumb luck on my part. Maybe, I don't know - but I can tell you from what I have seen this whole thing amounts to nothing more than a minor slowdown - and I suspect that a lot of the credit for it being a slowdown instead of an outage goes to the Admins for holding things together. Probably fair comment. When those who request our assistance in the search for solutions then fail to hold up their end of our partnership, do we not have a right to express our concern? Certainly you do, but you seem to be demanding something more too. Look, we all know there's a problem. Taking a look at your main system, you seem to be receiving about 10 WUs a day (not 'no work' as you said) on a dual core machine with a 4-hour runtime, which is about what you'd need to maintain your work. You also seem to have 2 tasks running and a back-up of a further 8 tasks, which is about another full day. If you're as concerned as you appear, even though you're maintaning your buffer, on the reasonable assumption that the admins are well aware of the issue and doing what they can, why don't you increase your run-time from 4 to 8 or 12 hours? Then your buffer will probably be full, your demand for tasks will reduce, the ability of the servers to fill what's demanded of them will marginally increase, the admins here would get a little more time to fix things at their end and everyone's happy. Unless your aim is to run out of work I can't see the problem. It's the responsible thing to do. The Project should communicate clearly and accurately to its contributors, regardless of responsibility and blame. Not doing so is, IMO, irresponsible and unprofessional. If the servers are down, report them as down. If they are up and running, then tell us why the flow of Work Units has been interrupted. Is that so difficult? Sounds reasonable to me. I asked for the same a few days ago. Until that arrives, if it does, we should also do what we can, don't you agree? How much are you increasing your runtime on your current tasks? Me? 12 hours. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
Such grand advice! Inane, but still grand. they see me trollin'... they haitin' edit: in more positive news, I just got a batch of WUs this morning. There has been worse shortages of WUs before. Apparently the supply is barely keeping up with the demand... let's look at it from the bright side, we have a lot more of participants in this project than we did last year! |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
Certainly you do, but you seem to be demanding something more too. I guess I really don't understand where you're coming from. I was not the OP on this thread. I am not the only contributor who asserted that we were receiving NO work units, and that the interruption of work lasted for more than a day (two days for me). Why do you believe that, just because YOU are not experiencing problems, then NOBODY could be experiencing problems? My machine was able to perform NO crunching for about two days or so. NO work units were sent to my machine. NOTHING ELSE was different! I hope that is clear. Why would anybody want to point a finger at a contributor when the evidence is overwhelming that the problem originated at the Rosetta Project? This is not, and should not be, some sort of flame war. If observations are accurate, the observer should not be criticized. To do so is a form of ad hominem attack. Convincing ourselves that there really are no problems certainly guarantees that problems will never be solved, doesn't it? My machine is now receiving and processing work on a steady basis. This discussion should be over (unless there are still other contributors who are not receiving work) ... deesy |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
This is not, and should not be, some sort of flame war. If observations are accurate, the observer should not be criticized. To do so is a form of ad hominem attack. You have made some valid points in this thread, but I think you should re-read some of your comments to me in light of your statement above. At the very least there is a slight trace of irony. This discussion should be over (unless there are still other contributors who are not receiving work) ... If you are happy to leave the discussion there then great. Happy crunching. ^_^ |
Michael Gould Send message Joined: 3 Feb 10 Posts: 39 Credit: 15,559,392 RAC: 3,735 |
Well, I'm glad to hear that some of you are getting work units, but the TeraFlops estimate and the jobs in progress are still way down, and I still haven't been able to get any WU's. Fortunately malariacontrol.net has plenty of work. I wonder if this is a deliberate slowdown. Does the Bakerlab staff have a union? Maybe this is a post-CASP job action! Folders of the molecule unite! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2179 Credit: 41,716,934 RAC: 7,334 |
I guess I really don't understand where you're coming from. I was not the OP on this thread. I am not the only contributor who asserted that we were receiving NO work units, and that the interruption of work lasted for more than a day (two days for me). Why do you believe that, just because YOU are not experiencing problems, then NOBODY could be experiencing problems? I haven't written a single one of those things, so it seems you not only don't understand where I'm coming from but also don't understand anything I wrote either. I thought I was quite supportive of you. For what it's worth, I got hundreds of error messages. I just didn't run out of work. My machine was able to perform NO crunching for about two days or so. NO work units were sent to my machine. NOTHING ELSE was different! I hope that is clear. Why would anybody want to point a finger at a contributor when the evidence is overwhelming that the problem originated at the Rosetta Project? <sigh> Last point first: you've been receiving plenty of tasks for 3 days, haven't you? On ad homimen attacks, look above after you started receiving sufficient work, and long after you knew there were issues. No-one is trying to convince anyone there are no problems - I asked for a brief comment about the issues 6 days ago - but the updown slowfast issue of the servers has been observed for some days, so just because they haven't succeeded in solving the problem, or commented, doesn't mean they aren't working on it. They clearly are. I notice earlier tonight all the servers were marked 'disabled' for about an hour. Hopefully that's done something, but a little more time will tell (not yet that I can see). So, what did you increase your runtime to? That's all you really needed to respond to. It's the only thing we can control and contribute from our end. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2179 Credit: 41,716,934 RAC: 7,334 |
Servers disabled again Edit: ..and back up again... |
deesy58 Send message Joined: 20 Apr 10 Posts: 75 Credit: 193,831 RAC: 0 |
So, what did you increase your runtime to? That's all you really needed to respond to. It's the only thing we can control and contribute from our end. At your suggestion, I have increased my runtime from 4 hours to 12 hours. I increased my buffer from .5 days to 1.5 days. In the past, I had a 24-hour runtime, and had problems. I shortened it to 4 hours when I upgraded from Windows XP to Windows 7, and reinstalled BOINC. Is there an optimum? If so, what is it, and why? Yes, I have been receiving WU's for the past three days. When I originally posted on August 27, however, I had received none at all for about 36 hours. Is this not the proper place to report problems and ask questions? Don't you believe that it might be insulting to tell contributors that there really are no problems when they report that they are not receiving work? As I said, my problem has been resolved (by Rosetta and/or BOINC). What is being gained by continuing to assert that there were no interruptions in the assignment of Work Units when there clearly was an interruption, even though it might now have been rectified? I guess I don't see the point. deesy |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
Michael Gould Send message Joined: 3 Feb 10 Posts: 39 Credit: 15,559,392 RAC: 3,735 |
A quick question for you guys. With this relatively long (by Rosetta standards) server problem, does it make sense for users to increase their amount of wu's on hand, at least in general terms. I have always kept my buffers fairly small, a half day or so, on the possibly mistaken assumption that doing this gives more control to the project, in terms of which wu's are getting processed in any given period of time. I guess the real question is would larger buffers be more beneficial to the project, or does it really not matter? For the individual user, having backup projects to run serves the same purpose, I suppose. Or just giving the processors a rest, when the project isn't feeding them with work. |
Message boards :
Number crunching :
no work units
©2025 University of Washington
https://www.bakerlab.org