No Work Units

Message boards : Number crunching : No Work Units

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Shawn H. Hall

Send message
Joined: 10 Sep 06
Posts: 6
Credit: 412,088
RAC: 0
Message 58547 - Posted: 5 Jan 2009, 22:32:07 UTC

Hi, All.

It looks as if some people are getting WUs, but none are queued for me. I have been having this issue for some time now, and there does not appear to be any particular reason for it. I currently have three computers (two PCs and one Mac) splitting time between SETI@home and Rosetta@home, and to be quite honest, SETI@home is nearly flawless, while Rosetta@home has mostly been a time-intensive nightmare.

I spend more time monitoring whether I get new WUs for Rosetta on my three computers than I spend doing anything else with BOINC. What is the most puzzling about this whole situation is that the WU shortage seems to rotate around through all three computers, with all three computers sometimes completely out of Rosetta WUs, or a totally random pattern of LOTS of WUs for one or two computers and none for one or two computers.

I have tried everything that I can think of to solve this problem on my end, such as detaching and reattaching, resetting, reinstalling, etc., without much luck. I MIGHT get a few WUs with the detach/reattach attempts, but this is quite annoying and doesn't always work.

What is most frustrating is that this situation is getting worse, with all of the computers running out of WUs more frequently and for a longer period of time. I am on broadband, so the Internet is always available for downloading or uploading on my end. Is there something that I should know or do?

Any help would be appreciated, as I am thinking about going back to Predictor@home if this situation doesn't clear up, and soon.
ID: 58547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58548 - Posted: 5 Jan 2009, 22:38:00 UTC

John, not to be "smart", but... have you tried just leaving it alone?

When you run multiple projects, there are many scenarios where only having tasks from one of the projects would be normal and expected.

Please review this thread for more details on how this can happen.
Rosetta Moderator: Mod.Sense
ID: 58548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FalconFly
Avatar

Send message
Joined: 11 Jan 08
Posts: 23
Credit: 2,163,056
RAC: 0
Message 58549 - Posted: 6 Jan 2009, 0:00:43 UTC
Last modified: 6 Jan 2009, 0:09:33 UTC

In 5 years of BOINC I have only witnessed four typical reasons for normal running Systems to run "dry" despite good Internet connection and working Project Scheduler server :

1) LTD of work-starved projects exceed 2000000s or more on a Host

After some discussion with Berkeley Devs I assume they still deny this inherent Client Scheduler malfunction, although they've repeatedly implemented dirty workarounds which somewhat slow the build rate of LTD. Looking at his Projects participation, I deem this unlikely though as all usually have work available.

2) Deferred Communication

Mostly affected older 4.x Clients, as Deferred Communication could reach ~7 days within a relatively short time. As he's using BOINC 6.x, I don't think this is an issue either however.

3) Packet Loss/Broken Lines towards Project Server

Usually caused either by Routing errors due to Backbone failure or manual cuts among carriers(latter occured several times in Europe), a shaky Internet connection (pure Packet Loss) or a provider filtering/blocking/slowing Ports required for BOINC communication; basically poor QOS in favor of classic HTTP/FTP/SMTP customer traffic.

"Should" be unlikely but can be checked by running continuous Ping and a TraceRoute check from a console.

As I'm seeing srv4.bakerlab.org (140.142.20.112) here, the Windows console commands would read
ping -t srv4.bakerlab.org
tracert srv4.bakerlab.org

4) Bad luck of the draw

After every outage of a larger or I/O-intensive Project (considering the Filesizes, I condsider Rosetta I/O-intensive) some Users get a comparably quick restart, while others (even with a fleet of Hosts) have to wait longer.
Not sure how to explain this from the technical side, but it's just something I witnessed over and over in several Projects. Call it "Murphy's Law" if you wish, apart from the Update/Reset Project hardcore methods (not desired or recommended) nothing a BOINC user could do about it except let back Projects run and patiently wait.

( 5 ) Optional and considered unlikely during normal operations

Human error ;) or Host failure :(

For first I'd check if BOINC is really set to have Network Connectivity and the Project is set to accept new work (restart the System is sometimes also a suitable fix). In some cases a Scandisk after a few 'hard' shutdowns is also in order. Last resort, a "reset Project" sometimes helped a quirked BOINC setup back to life - after all, it's just Software that sometimes has the "ghost within". If a new Router, separate Firewall or new Network Device Drivers have been installed, it's frequently a good point to look for.

Otherwise, Worms/Viruses can screw up Windows Network settings with the remaining System still apparently working normal. Deep scans with respectively dedicated Tools against Spyware/Adware/Malware, Viruses/Worms/Trojans and Backdoors/Rootkits can reveal a hidden Problem once a while.
ID: 58549 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn H. Hall

Send message
Joined: 10 Sep 06
Posts: 6
Credit: 412,088
RAC: 0
Message 58550 - Posted: 6 Jan 2009, 0:51:01 UTC - in response to Message 58548.  

John, not to be "smart", but... have you tried just leaving it alone?

When you run multiple projects, there are many scenarios where only having tasks from one of the projects would be normal and expected.

Please review this thread for more details on how this can happen.


Well, I have been around these projects for years, starting with SETI@Home in the 90's, and continuing on with my favorite projects almost since their inceptions. Consequently, and even though you'd never know it by my posting activity, I am quite well versed in most aspects of these various projects and BOINC, itself.

While many problems do sort themselves out, and I have allowed them to do just that over the years, this new problem is distressing enough that I am running out of patience. My BOINC client is nearly running out of WUs for everything, since SETI@Home and Rosetta@Home are set to split the resources 50/50. Apparently, SETI@Home is waiting for Rosetta@Home to fill the void--but it isn't.

What to do, what to do, what to do...
ID: 58550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn H. Hall

Send message
Joined: 10 Sep 06
Posts: 6
Credit: 412,088
RAC: 0
Message 58551 - Posted: 6 Jan 2009, 1:03:42 UTC - in response to Message 58549.  


4) Bad luck of the draw

After every outage of a larger or I/O-intensive Project (considering the Filesizes, I condsider Rosetta I/O-intensive) some Users get a comparably quick restart, while others (even with a fleet of Hosts) have to wait longer.
Not sure how to explain this from the technical side, but it's just something I witnessed over and over in several Projects. Call it "Murphy's Law" if you wish, apart from the Update/Reset Project hardcore methods (not desired or recommended) nothing a BOINC user could do about it except let back Projects run and patiently wait.


Hey, FF.

Thanks for the input. I suspect that your #4 answer might be closer to the truth than any other answer, but, if so, my luck has been horrible for months, now.

If I wait long enough, and usually in conjunction with a reattachment, I will get new WUs...USUALLY. Lately though, there has been no magic bullet in getting Rosetta@Home to respond to my many pleas for WUs. I guess that I will allow perhaps one more week of patient waiting to elapse before I do something drastic, but after that my frustration level will probably have reached its zenith, and on I will move.

We shall see...

Thanks again,

JLHunter


ID: 58551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 58554 - Posted: 6 Jan 2009, 4:56:19 UTC

John, yes once one project gets ahead of the other (i.e. debt builds up) then yes, BOINC will be optimistic, and sort of keep some slack in the schedule for the project that is behind in CPU time as compared to it's resoure share.

Since Rosetta has been out of work for roughly 4 days, no work has been available to fill that slack. Now that Rosetta work is available again, I suspect the result, in your case, will likely be Rosetta "taking over my machine". Running virtually not SETI work. Not switching between projects "on my scheduled 1hr interval". This is BOINC working off the debt. It is normal and usually doesn't take more the a day or two. Since you were probably crunching only SETI for 3 or 4 days, doesn't it mean that for a 50% share that Rosetta is two days behind? Of course by the time you crunch another 2 days of Rosetta work, 6 days have passed and of those, Rosetta should have 3. So, if Rosetta was short going in to these 4 days, it could take as much as 4 or 5 days to get back to a normal balance. Each project having work, and each alternating with the other.

...and yes, I'm seeing in the BOINC distribution list that there are some serious scheduler problems in the current BOINC version. I guess the attempt at accounting for the new graphics co-processor capabilities have introduced some new challenges to scheduling.
Rosetta Moderator: Mod.Sense
ID: 58554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn H. Hall

Send message
Joined: 10 Sep 06
Posts: 6
Credit: 412,088
RAC: 0
Message 58556 - Posted: 6 Jan 2009, 8:03:03 UTC - in response to Message 58555.  
Last modified: 6 Jan 2009, 8:08:13 UTC

The rosetta application itself is not responsible for the scheduling. That is the job of the manager, so I'm not entirely sure switching to another project will help. Am I correct in assuming one of your machines is a PPC mac? the mini-rosetta apllication is not supported anymore for PPC. That means you will only see work for the "old" application on that machine, and that sort of work is not readily available.


Yes, transient, you are correct. As a matter of fact, and after some research, I had already discovered this fact for myself. This link will take anyone with a similar problem to a thread that says it all: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4360. Namely, that PPCs are toast--small, insignificant pun intended...

What is a bit disappointing to me is that this problem is apparently so little known, based upon the responses that I received. I would expect that an issue for which it is that easy to discover an answer--when looked for in the right place--would similarly be well-known and disseminated accordingly.

In any case, as hard as the Rosetta team works, and as I understand the manpower that it takes to port an application to so many different platforms, it is still quite a disappointment to find out that the PPC computers are being phased out. From what I have read and seen in both the forums and the results, PPCs were, and to some extent still are, a potent and dedicated force in the BOINC community of distributed computing.

Thanks for everyone's help but, ultimately, I am going to have to follow the lead of one of the thread contributors in this thread, https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4360, and move over to Einstein@Home where, apparently, and for the time being, PPCs are still fully supported.

Thanks again,

JLHunter
ID: 58556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : No Work Units



©2024 University of Washington
https://www.bakerlab.org