Message boards : Number crunching : Warning: Don't shut down BOINC Manager..!!
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
I'm not sure if this is a problem with the WU or if it's a problem due to my system crash. System crash ... :) That is a common error for things like device drivers. You can look it up in the Wiki off the Messages link in the front page. BUt, I would like to look at the log for the heck of it ... Zip up the *.TXT files in the boinc directory and send to p.d.buck@comcast.net I never know if I am going to find something good or not ... You just can never tell ... |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
[B@H] Ray Send message Joined: 20 Sep 05 Posts: 118 Credit: 100,251 RAC: 0 |
CPDN has been having the same problem foe about 1/3 of there users. Before rebooting if you suspend CPDN and Rosetta and exit BOINC this should not be a problem they found. CPDN is worse, a lot of people have been reporting damaged models after lousing power so CPDN is recommending that there people back up the BOINC folder occasionally. Pizza@Home Rays Place Rays place Forums |
Desti Send message Joined: 16 Sep 05 Posts: 50 Credit: 3,018 RAC: 0 |
CPDN has been having the same problem foe about 1/3 of there users. Before rebooting if you suspend CPDN and Rosetta and exit BOINC this should not be a problem they found. Yea, happend to me today after I accidently turned of the wrong power switch. Luckily that I have made a backup yesterday. :-) LUE |
[B@H] Ray Send message Joined: 20 Sep 05 Posts: 118 Credit: 100,251 RAC: 0 |
CPDN has been having the same problem foe about 1/3 of there users. Before rebooting if you suspend CPDN and Rosetta and exit BOINC this should not be a problem they found. You are luckey, as far as I know only people running CPDN make backup's of BOINC. If the WU's were larger here this would be another to back up for. Pizza@Home Rays Place Rays place Forums |
RDC Send message Joined: 16 Sep 05 Posts: 43 Credit: 101,644 RAC: 0 |
CPDN has been having the same problem foe about 1/3 of there users. Before rebooting if you suspend CPDN and Rosetta and exit BOINC this should not be a problem they found. Weird, I've had WU's crash on me for every project except CPDN. That's usually the WU that gets hit the hardest too since it usually is the project being crunched when some idiot decides to make love to a nearby telephone pole with their car ;) It's happened 3 times so far in the past 6 months... |
George Send message Joined: 27 Nov 05 Posts: 8 Credit: 634,319 RAC: 0 |
This is a bit off topic, but for some reason when Rosetta is running my CPU load is not 100% as I expect it to be because Seti@home uses all the CPU. Is this normal, or have I set something wrong in preference? I have both Seti and Rosetta attached. When Seti runs the CPU load is 100%; when Rosetta runs it's near 0%. Thanks for any help. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
This is a bit off topic, but for some reason when Rosetta is running my CPU load is not 100% as I expect it to be because Seti@home uses all the CPU. Is this normal, or have I set something wrong in preference? I have both Seti and Rosetta attached. When Seti runs the CPU load is 100%; when Rosetta runs it's near 0%. You have two computers attached - one seems to be working, with a couple of errors. The other is returning nothing but errors. Both are single-CPU systems, so BOINC should run _either_ SETI or Rosetta at any one time, never both. First question is which computer (by ID# not name please, we can't see the names) are you asking about? Do you see two "Running" status indications in the Work tab? How are you measuring CPU load, with Task Manager? As far as preference settings - you MUST set it to "leave application in memory = YES" in order for Rosetta to work properly. They're chasing that bug now, but it's not fixed yet. SETI doesn't care, but actually you'll lose a _little_ crunching time even on SETI if that option is set to "NO", because it won't checkpoint when switched out. |
George Send message Joined: 27 Nov 05 Posts: 8 Credit: 634,319 RAC: 0 |
Thanks for the help. Both systems have the symptom. 78132 is the fast one and I'm guessing it is the one with few errors. 76452 is an older slow systgem but hosts some servers and so is on all the time. The errors may come from me suspending and resuming work in attempts to get the CPU usage up. I do see 'Running' when the CPU usage is near zero in Task Manager. As soon as I suspend Rosetta and Seti starts, usage jumps to 100%, then I resume Seti and it drops back down. I did have 'leave app in memory' set to no in general prefs, just switched it to yes, we'll see if that's it. Thanks again for the pointer. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
I did have 'leave app in memory' set to no in general prefs, just switched it to yes, we'll see if that's it. 76452 still hasn't turned in a single valid result... the error reported each time _looks_ like the "leave in memory" problem. I didn't look any further at it. 78132 is successful most of the time, possibly because it can complete a result fast enough to not be switched out, or if it is switched out, not as many times. It has only a couple of possible memory-swap errors. The CPU time it is taking to complete results looks to be in-line with what I would expect from a computer with these benchmarks, so when it _does_ get the CPU, it's using it. I'm seeing, for example, one result I randomly picked, was downloaded to your computer, you spent 2 hours of CPU time on it, and returned it 4 hours later. No problem at all, and if SETI and Rosetta are 50/50 resource share, this is exactly what I'd expect - from the results page, I'm just not seeing a performance problem. I would expect from what you're describing to see a WU downloaded and completing in 2 hours CPU time - but taking 24 or 48 hours (or longer) to be returned. Your average turnaround on 78132 is only 6 hours. I would suggest now that you've changed the pref, just let both systems run for a couple of days. Then take a look at the results pages for both computers and see if you are still getting errors, and take a look at the RAC for each. 76452 isn't going to get much of a RAC, but it should do more than zero! I'm a Mac person, so I can't really tell you what to expect from 78132, but at a wild guess, I would say maybe 200 or 250 (again, if SETI and Rosetta are 50/50). It is already at 92.44 after just a few days, and it takes a week or so to "level out". If it has a similar RAC on SETI and Rosetta (it may take longer to level on SETI because of the need for a quorum on each result) and Rosetta is still showing 0% CPU use... well, you must have one of those CPUs that were altered by the aliens to help us fold proteins, and Rosetta work is being done in another dimension... :-) Regardless, PLEASE post back here in a few days and let us know how it's going, and we'll look at things again, see if there are any "tweaks" we can suggest, make sure this is solved. And of course, if you have any other questions or problems, feel free to post before then. |
George Send message Joined: 27 Nov 05 Posts: 8 Credit: 634,319 RAC: 0 |
Wow. What a response. If only my cell phone provider and ISP could be a quarter as responsive! And I pay them big bucks. Sitting here at 78132 The CPU is showing 0% usage in Task Manager while Rosetta shows as 'Running' and Seti shows 'Preempted' in BOINC Manager. Darn those pesky trans-dimensional aliens! I don't care about the reported CPU usage as long as the WU is getting done, but if there is some weirdness perhaps we should know about it and publish it so others don't get similarly confused. If so we should start another thread with a more germane title. I do have "Do work while computer in use?" set to 'yes'. Thanks so much, and I'll keep an eye on it and report back. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Wow. What a response. If only my cell phone provider and ISP could be a quarter as responsive! And I pay them big bucks. O_O I have just recently finished yet another battle with BOTH my cell phone provider, AND my ISP... plus my local grocery store... Sigh. Very few now have ever heard of a thing called "customer service". It's "we want your money, we don't care if we piss you off, you don't have very many choices and the others are just as bad so deal with it, it would cost us another $0.10 on the quarterly share price if we actually paid our employees enough to be able to hire somebody with a brain!" So needless to say, thank you. If I ever start providing service like a cell phone company or ISP, I'll just shut down the browser, 'cause it obviously isn't fun any more. :-) |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
Wow. What a response. If only my cell phone provider and ISP could be a quarter as responsive! And I pay them big bucks. Two more things to try. On the "work" tab in Boinc manager, can you see the CPU time increasing? Secondly, the processes tab of Task Manager will show you a breakdown of the CPU usage for each process. When you have a Rosetta unit "running" where does task manager say the most CPU cycles are going? i.e. what "image name" gets them? |
TPR_Mojo Send message Joined: 20 Sep 05 Posts: 4 Credit: 684,947 RAC: 0 |
This is a bit off topic, but for some reason when Rosetta is running my CPU load is not 100% as I expect it to be because Seti@home uses all the CPU. Is this normal, or have I set something wrong in preference? I have both Seti and Rosetta attached. When Seti runs the CPU load is 100%; when Rosetta runs it's near 0%. I have this occasionally too. I just aborted a unit which was sat at status "running" and CPU had been idling for at least an hour. Nothing of note in the output files to explain why, and no R@H tasks running. It was as if the task just disappeared but BOINC manager thought it was active. When/if it happens again I'll take better notes and copy files etc. |
George Send message Joined: 27 Nov 05 Posts: 8 Credit: 634,319 RAC: 0 |
:( Sorry to hear about the service provider woes. I feel your pain - I have them too. I'm thinking about moving to Russia for the more enlightened customer service due to competition... Right now with Rosetta running (as shown in BOINC Mgr) on 78132 Task Manager shows 0% CPU usage. In the Task Manager process list, sorted by CPU load, Rosetta is near the top while memory for the process is ~20MB, but System Idle process shows 90 - 100%. The next highest process for CPU time is task mgr and Konfabulator (have you tried it? It's great - konfabulator.com) at a few % each. The CPU time in BOINC Mgr for Rosetta has not changed in the 10 or so minutes I've been watching. It certainly appears no work is being done and cycles are being wasted. One Seti process is preempted while another shows "Ready to report". If I click "Show Graphics", nothing happens, but I know this has worked for me before because I've seen the Rosetta graphics while a WU was running - very impressive. I wonder if something is conflicting with Rosetta. It certainly seems to be stalled. |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
If I click "Show Graphics", nothing happens Was the Rosetta result selected when you clicked it? This is really sounding like it's NOT working, which is confusing seeing results coming back... |
George Send message Joined: 27 Nov 05 Posts: 8 Credit: 634,319 RAC: 0 |
Okay, we may get to file this in the "shut up and reboot" Windows drawer. Just for kicks and giggles I restarted - hadn't done it for a couple of weeks, I think - and now everything seems fine. Rosetta's running, CPU is at 100%, graphics come up fine, everything looks peachy. I can see the CPU time for Rosetta (in BOINC Manager Work tab) counting up, but it seems to be counting up from the stalled number I saw before, implying that Rosetta really was making no progress while the CPU load was 0%. I'll continue to monitor it. If it's all fixed by the reboot, I should see my average WU/day rate rise for this computer. But it does seem as if Windows can get into a state that starves Rosetta but not Seti without any alarms going off. To answer your question, yes, I had the line that showed Rosetta running selected when I clicked "Show Graphics". Lowfield, what happens if you reboot (if you can)? |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
Okay, we may get to file this in the "shut up and reboot" Windows drawer. One of my favorite places (hey, I'm a Mac guy). But, but, but... what about the other computer? I'm so confused... Looking back at your results, I do _not_ see _anything_ returned since shortly before you posted the first time. The other computer has been silent for longer than that. So if the problem had just appeared the day you posted, all my blathering about the times on your results was, um, blather. I need more sleep... |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Some of the older versions of BOINC had a habit of doing this after running for a long time and getting no contacts with the scheduler. Something about using up handles. Basically a resource leak (reason enough to not like C++). I forget which version clears this up. But, it used to be common on my PowerMac ... |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 34 |
>>> (reason enough to not like C++) Leaking handles is not a language specific problem. You can leak handles in any language if you obtain them, and don't free them. It is a design issue, or sloppy coding. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Message boards :
Number crunching :
Warning: Don't shut down BOINC Manager..!!
©2024 University of Washington
https://www.bakerlab.org