no work units

Message boards : Number crunching : no work units

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67473 - Posted: 30 Aug 2010, 22:48:14 UTC


_whistle_whistle_whistle_
Always look on the bright sight of life...
_whistle_whistle_whistle_

ID: 67473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 67474 - Posted: 30 Aug 2010, 22:54:17 UTC - in response to Message 67473.  


_whistle_whistle_whistle_
Always look on the bright sight of life...
_whistle_whistle_whistle_


Like an ostrich? Pollyanna? Through rose colored glasses?

Just wondering ... :-)
ID: 67474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67475 - Posted: 30 Aug 2010, 23:16:01 UTC

***CAUTION*** moderation of off-topic or confrontational posts ahead. Please take the advice of my last post and drop the mutual criticism. It serves no constructive purpose to continue the current divergence of the topic in this thread.
Rosetta Moderator: Mod.Sense
ID: 67475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 67478 - Posted: 31 Aug 2010, 0:07:04 UTC

Um, I think there is a really good point being made here. If any other project I crunch on goes down for even five minuets it seems like someone who works on the project posts what happened and what they are doing about it.

Rosetta has gone down several times since I started crunching here and I have never seen a post from anyone that works on the project about what is going on. Am I missing a thread somewhere?
ID: 67478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Rush

Send message
Joined: 6 Oct 05
Posts: 13
Credit: 50,739,296
RAC: 6,646
Message 67479 - Posted: 31 Aug 2010, 0:26:08 UTC - in response to Message 67478.  

I truly do NOT understand why anyone gets angry when a BOINC project goes down. I "love" Rosetta. It is by far my favorite of all the BOINC projects. But if it goes down, so what? The beauty of BOINC is that I can run several projects simultaneously. When Rosetta fails, I simply crunch more WUs from several other (almost-as) worthy projects. Meanwhile I figure my BOINC program is acquiring a large Rosetta debt so when Rosetta is back at 100%, I'll get a boatload of Rosetta WUs.

ID: 67479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 67480 - Posted: 31 Aug 2010, 1:53:30 UTC - in response to Message 67472.  
Last modified: 31 Aug 2010, 1:57:10 UTC

Hey guys - I think that it's time to cool it a little bit - I'm sure the folks up in Washington are doing their best and unless you are an absolute credit whore what is this slow down really costing you?

Are you loosing data or are important results you absolutely must have right now being delayed?

I'm going to guess the answer to that one is no.

The only folks being hurt by the slow down are the researchers and the credit whores - and to be honest the slow down has not been all that bad - except for on the 13th I have been running within 10% of my normal daily average.

If I did things right you should be able to see a graph of that below - fresh off the free-dc site.

I'm not doing anything special - my systems are set for a six hour run time and a half day work queue - I am not doing anything to hoard work units in an attempt to bridge the gap. I'm just an average guy with systems set up to crunch, just like everyone else.

Like everyone else I have has a few cores idle up during the past week but is it worth the stress some of you seem to be dealing with?

I don't think so.

About the only thing I can think of that might make my systems a little different than those run by others is that I don't have a backup project - I am 100% dedicated to Rosetta@home - I do not know the internals of BOINC that well but could it be that if I were busy crunching on a backup project I might not notice it right away when Rosetta work units did become available?

Maybe, I don't know - but I can tell you from what I have seen this whole thing amounts to nothing more than a minor slowdown - and I suspect that a lot of the credit for it being a slowdown instead of an outage goes to the Admins for holding things together.

I'll climb down off my soapbox now, thank you.


So let's see, then. You're saying that, because you didn't see an interruption in WU processing, then nobody did! That makes perfect sense . . . in some other universe. Calling people "credit whores" and becoming an apologist for a system that appears to need attention doesn't accomplish much.

We who are trying to help in the search for the causes and cures for deadly diseases do a lot of things to increase our levels of contribution. Some of us purchase more powerful hardware than we might otherwise need. Many of us leave our computers on and running 24/7, paying for the electrical energy required to operate the machines, and the electricity needed to remove the heat byproduct from our homes and places of business. When those who request our assistance in the search for solutions then fail to hold up their end of our partnership, do we not have a right to express our concern?

To attempt to stifle any criticism of the appearance of questionable performance is simply arrogant. The Project should communicate clearly and accurately to its contributors, regardless of responsibility and blame. Not doing so is, IMO, irresponsible and unprofessional. If the servers are down, report them as down. If they are up and running, then tell us why the flow of Work Units has been interrupted. Is that so difficult?

deesy


When you stop crunching rosetta, you stop using as much electricity. I don't see your point.

No WUs left?
Worried about electricity?

Shut down your computer(s). Or better yet, attach POEM which is similar to rosetta so then no CPU cycles would be wasted!

btw, I still have plenty of WUs in queue.
ID: 67480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 67484 - Posted: 31 Aug 2010, 6:59:31 UTC

When you stop crunching rosetta, you stop using as much electricity. I don't see your point.

No WUs left?
Worried about electricity?

Shut down your computer(s). Or better yet, attach POEM which is similar to rosetta so then no CPU cycles would be wasted!

btw, I still have plenty of WUs in queue.


Such grand advice! Inane, but still grand.

deesy
ID: 67484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bob Merrill

Send message
Joined: 3 Oct 09
Posts: 1
Credit: 2,511,208
RAC: 0
Message 67487 - Posted: 31 Aug 2010, 10:03:19 UTC

I haven't gotten any work for 3 days now. Whats going on? I don't see anything on the home page about this. Good thing I have 2 other jobs running.
ID: 67487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sparky66

Send message
Joined: 31 Dec 05
Posts: 2
Credit: 6,770,528
RAC: 2,046
Message 67488 - Posted: 31 Aug 2010, 10:17:32 UTC - in response to Message 67484.  


Such grand advice! Inane, but still grand.


Be careful with the trolling. Mod.Sense has the red pen out.


BTW, anyone notice it's difficult to get work units? :P


ID: 67488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 67490 - Posted: 31 Aug 2010, 13:35:31 UTC - in response to Message 67488.  


Such grand advice! Inane, but still grand.


Be careful with the trolling. Mod.Sense has the red pen out.


BTW, anyone notice it's difficult to get work units? :P


It seems to be getting better. The servers have been keeping me supplied with enough to keep me going. I try and ensure a reserve of 12 work units with a 6 hour run time.
ID: 67490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 67491 - Posted: 31 Aug 2010, 14:02:51 UTC

I suspect the make_work servers need some attention.

How do you know that?

On the front page there are 1.8m jobs waiting to complete but only 10 ready to send out. Work is coming through but only at about half the rate it needs to to keep people supplied (86k completed last 24h equiv to 4.2m credits when it's usually nearer 10m credits).

Sorry! Beg to differ. I checked the Server Status page numerous times (as did others), and all servers were reported by the page to be up and running. Yet, we were receiving no work to process.

It can't be no work, just not enough and not at a rate that keeps everyone supplied at all times. That's a different thing. Certainly a problem that needs to be solved, I agree. There appears to be no status of "Running at xx% efficiency" - that's more like the problem I'm seeing.

Are you losing data or are important results you absolutely must have right now being delayed? I'm going to guess the answer to that one is no.

The only folks being hurt by the slow down are the researchers and the credit whores - and to be honest the slow down has not been all that bad - except for on the 13th I have been running within 10% of my normal daily average.

I'm pretty sure I've had no downtime at all, amazingly. That's partly because the shortage in my buffer has been filled by some WCG tasks but also because most of my work calls have been to fill a buffer of tasks, not actual work to crunch. I even set WCG to no new tasks for a few days and still haven't run out. I know that may just be dumb luck on my part.

Maybe, I don't know - but I can tell you from what I have seen this whole thing amounts to nothing more than a minor slowdown - and I suspect that a lot of the credit for it being a slowdown instead of an outage goes to the Admins for holding things together.

Probably fair comment.

When those who request our assistance in the search for solutions then fail to hold up their end of our partnership, do we not have a right to express our concern?

Certainly you do, but you seem to be demanding something more too.

Look, we all know there's a problem. Taking a look at your main system, you seem to be receiving about 10 WUs a day (not 'no work' as you said) on a dual core machine with a 4-hour runtime, which is about what you'd need to maintain your work. You also seem to have 2 tasks running and a back-up of a further 8 tasks, which is about another full day.

If you're as concerned as you appear, even though you're maintaning your buffer, on the reasonable assumption that the admins are well aware of the issue and doing what they can, why don't you increase your run-time from 4 to 8 or 12 hours? Then your buffer will probably be full, your demand for tasks will reduce, the ability of the servers to fill what's demanded of them will marginally increase, the admins here would get a little more time to fix things at their end and everyone's happy.

Unless your aim is to run out of work I can't see the problem. It's the responsible thing to do.

The Project should communicate clearly and accurately to its contributors, regardless of responsibility and blame. Not doing so is, IMO, irresponsible and unprofessional. If the servers are down, report them as down. If they are up and running, then tell us why the flow of Work Units has been interrupted. Is that so difficult?

Sounds reasonable to me. I asked for the same a few days ago. Until that arrives, if it does, we should also do what we can, don't you agree? How much are you increasing your runtime on your current tasks?

Me? 12 hours.
ID: 67491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 67495 - Posted: 31 Aug 2010, 19:02:53 UTC - in response to Message 67484.  
Last modified: 31 Aug 2010, 19:09:08 UTC

Such grand advice! Inane, but still grand.

deesy


they see me trollin'... they haitin'

edit: in more positive news, I just got a batch of WUs this morning. There has been worse shortages of WUs before. Apparently the supply is barely keeping up with the demand... let's look at it from the bright side, we have a lot more of participants in this project than we did last year!
ID: 67495 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 67496 - Posted: 31 Aug 2010, 19:17:07 UTC

Certainly you do, but you seem to be demanding something more too.

Look, we all know there's a problem. Taking a look at your main system, you seem to be receiving about 10 WUs a day (not 'no work' as you said) on a dual core machine with a 4-hour runtime, which is about what you'd need to maintain your work. You also seem to have 2 tasks running and a back-up of a further 8 tasks, which is about another full day.

If you're as concerned as you appear, even though you're maintaning your buffer, on the reasonable assumption that the admins are well aware of the issue and doing what they can, why don't you increase your run-time from 4 to 8 or 12 hours? Then your buffer will probably be full, your demand for tasks will reduce, the ability of the servers to fill what's demanded of them will marginally increase, the admins here would get a little more time to fix things at their end and everyone's happy.

Unless your aim is to run out of work I can't see the problem. It's the responsible thing to do.


I guess I really don't understand where you're coming from. I was not the OP on this thread. I am not the only contributor who asserted that we were receiving NO work units, and that the interruption of work lasted for more than a day (two days for me). Why do you believe that, just because YOU are not experiencing problems, then NOBODY could be experiencing problems? My machine was able to perform NO crunching for about two days or so. NO work units were sent to my machine. NOTHING ELSE was different! I hope that is clear. Why would anybody want to point a finger at a contributor when the evidence is overwhelming that the problem originated at the Rosetta Project?

This is not, and should not be, some sort of flame war. If observations are accurate, the observer should not be criticized. To do so is a form of ad hominem attack. Convincing ourselves that there really are no problems certainly guarantees that problems will never be solved, doesn't it? My machine is now receiving and processing work on a steady basis. This discussion should be over (unless there are still other contributors who are not receiving work) ...

deesy
ID: 67496 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 67497 - Posted: 31 Aug 2010, 20:03:57 UTC - in response to Message 67496.  

This is not, and should not be, some sort of flame war. If observations are accurate, the observer should not be criticized. To do so is a form of ad hominem attack.


You have made some valid points in this thread, but I think you should re-read some of your comments to me in light of your statement above. At the very least there is a slight trace of irony.

This discussion should be over (unless there are still other contributors who are not receiving work) ...

deesy


If you are happy to leave the discussion there then great.

Happy crunching. ^_^
ID: 67497 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Gould

Send message
Joined: 3 Feb 10
Posts: 39
Credit: 14,744,852
RAC: 3,750
Message 67499 - Posted: 31 Aug 2010, 21:08:30 UTC

Well, I'm glad to hear that some of you are getting work units, but the TeraFlops estimate and the jobs in progress are still way down, and I still haven't been able to get any WU's. Fortunately malariacontrol.net has plenty of work.

I wonder if this is a deliberate slowdown. Does the Bakerlab staff have a union? Maybe this is a post-CASP job action! Folders of the molecule unite!
ID: 67499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 67500 - Posted: 31 Aug 2010, 21:34:07 UTC - in response to Message 67496.  

I guess I really don't understand where you're coming from. I was not the OP on this thread. I am not the only contributor who asserted that we were receiving NO work units, and that the interruption of work lasted for more than a day (two days for me). Why do you believe that, just because YOU are not experiencing problems, then NOBODY could be experiencing problems?

I haven't written a single one of those things, so it seems you not only don't understand where I'm coming from but also don't understand anything I wrote either. I thought I was quite supportive of you. For what it's worth, I got hundreds of error messages. I just didn't run out of work.

My machine was able to perform NO crunching for about two days or so. NO work units were sent to my machine. NOTHING ELSE was different! I hope that is clear. Why would anybody want to point a finger at a contributor when the evidence is overwhelming that the problem originated at the Rosetta Project?

This is not, and should not be, some sort of flame war. If observations are accurate, the observer should not be criticized. To do so is a form of ad hominem attack. Convincing ourselves that there really are no problems certainly guarantees that problems will never be solved, doesn't it? My machine is now receiving and processing work on a steady basis. This discussion should be over (unless there are still other contributors who are not receiving work)...

<sigh>

Last point first: you've been receiving plenty of tasks for 3 days, haven't you? On ad homimen attacks, look above after you started receiving sufficient work, and long after you knew there were issues.

No-one is trying to convince anyone there are no problems - I asked for a brief comment about the issues 6 days ago - but the updown slowfast issue of the servers has been observed for some days, so just because they haven't succeeded in solving the problem, or commented, doesn't mean they aren't working on it. They clearly are. I notice earlier tonight all the servers were marked 'disabled' for about an hour. Hopefully that's done something, but a little more time will tell (not yet that I can see).

So, what did you increase your runtime to? That's all you really needed to respond to. It's the only thing we can control and contribute from our end.
ID: 67500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 67501 - Posted: 31 Aug 2010, 22:37:23 UTC
Last modified: 31 Aug 2010, 23:02:47 UTC

Servers disabled again

Edit: ..and back up again...
ID: 67501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 67503 - Posted: 1 Sep 2010, 0:17:28 UTC

So, what did you increase your runtime to? That's all you really needed to respond to. It's the only thing we can control and contribute from our end.


At your suggestion, I have increased my runtime from 4 hours to 12 hours. I increased my buffer from .5 days to 1.5 days. In the past, I had a 24-hour runtime, and had problems. I shortened it to 4 hours when I upgraded from Windows XP to Windows 7, and reinstalled BOINC. Is there an optimum? If so, what is it, and why?

Yes, I have been receiving WU's for the past three days. When I originally posted on August 27, however, I had received none at all for about 36 hours. Is this not the proper place to report problems and ask questions? Don't you believe that it might be insulting to tell contributors that there really are no problems when they report that they are not receiving work?

As I said, my problem has been resolved (by Rosetta and/or BOINC). What is being gained by continuing to assert that there were no interruptions in the assignment of Work Units when there clearly was an interruption, even though it might now have been rectified? I guess I don't see the point.

deesy
ID: 67503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 67504 - Posted: 1 Sep 2010, 0:33:33 UTC

I see that there are over 4k units in the queue now as of writing this, and all my PCs are full of work now. Yay!
ID: 67504 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Gould

Send message
Joined: 3 Feb 10
Posts: 39
Credit: 14,744,852
RAC: 3,750
Message 67506 - Posted: 1 Sep 2010, 4:53:43 UTC

A quick question for you guys. With this relatively long (by Rosetta standards) server problem, does it make sense for users to increase their amount of wu's on hand, at least in general terms. I have always kept my buffers fairly small, a half day or so, on the possibly mistaken assumption that doing this gives more control to the project, in terms of which wu's are getting processed in any given period of time.

I guess the real question is would larger buffers be more beneficial to the project, or does it really not matter? For the individual user, having backup projects to run serves the same purpose, I suppose. Or just giving the processors a rest, when the project isn't feeding them with work.
ID: 67506 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : no work units



©2024 University of Washington
https://www.bakerlab.org