Message boards : Number crunching : Losing Connection from Boinc Manager to Client
Author | Message |
---|---|
onemacguy Send message Joined: 10 Nov 05 Posts: 12 Credit: 2,564,700 RAC: 0 |
I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost. I have Boinc running as a service under anAdmin account. I lost 3 days on one of my top machines because of this. This is ridiculous! When are you gonna fix this? Between this and there being no timeout on stuck workunits, I lose thousands of points per day. I have to monitor my farm EVERY DAY to check and see if ebery machine is reporting. I should not have to do this. Do ALL of the Boinc projects have these problems? Folding@Home did not do this to me. When can I expect a stable client. I am almosy ready to switch to a new project after find my new Dual 3.8 Xeon workstation no working for two days. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost. I have Boinc running as a service under anAdmin account. I lost 3 days on one of my top machines because of this. This is ridiculous! When are you gonna fix this? Between this and there being no timeout on stuck workunits, I lose thousands of points per day. I have to monitor my farm EVERY DAY to check and see if ebery machine is reporting. I should not have to do this. Rosetta, does time out stuck workunits, at least if they are using the newer fixe Rosetta 5.07. Of course older jobs may still use an older version until they are finished. The disconnection from BoincManager is a BOINC issue, Rosetta have littel control over that. But factor that effect it are, something else is using the the PORT BOINC uses. Often a Microsoft program does this (and microsoft should not be using that port) So later then 5.2.13 clients have this fixed as they now use a different port by default. But they are still classed as development versions (though the BBC use one of the development versions for there BBC-CCE project.) Hopefully an official version should be released soon (week or so). Though neither of these should effect your Dual Xeon from getting new work. Maybe a firewall is blocking boinc.exe ? I say this as BOINC Manager has no effect on you ability to get new tasks(work), it never actualy needs to be running. 'boinc' (boinc.exe for windows) does all the hard work. Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Since you've hidden your computers (wich is a pain when trying to help) that's about all I can say at the moment. (The only reason I cann assume for hidiing them is so people don't see your using superbech (mac) or Crunch3r/Trux (win) optimised client to inflate benchmarks.... like pretty much everyone does ;-)) Team mauisun.org |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I am almosy ready to switch to a new project after find my new Dual 3.8 Xeon workstation no working for two days. Are you a Windows user? Which Boinc core client are you using? |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Yes, Fluffy chicken, this is most likely a "boinc" issue. I've seen many reports of this since the recent MS update came out. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
http://boinc.berkeley.edu/download.php?min_version=5.0&dev=1 They've always seemed to work fine, so give one a go :-) Note, you will probably loose tha ability to use Crunch3rs client and you'll need to use Truxoft. Crunch3rs client is based on the older (current recommended 5.2.13) version and is effected by the port problem. Team mauisun.org |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost. I see this as well. In my case, Windows, singler user installation. The manager just loses contact with the running threads, and so no data shows in the tabs at all. And so you can't see the projects to do an update to get any work, etc. It seems to occur only after running for several days without incident, and so I tend to think there's a bug in the Windows TCP stack rather than directly trying to use the point which I know BOINC HAD been attached to when it was started originally. I basically have to reboot to get TCP working again. If I recall, once this occurs, my browser doesn't even work. Any suggestions for how to study it further when it fails again? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
you could open a command prompt and enter "netstat -a" or "netstat -ao" and see what's grabbed port 1043. It should be a loopback to some other random port and 127.0.0.1 on 1043. Unfortunately the MS product is grabbing the port in boot and this stops boinc. Installing as service has helped some. A cold boot (not just restart) has helped others. There has been a rash of this issues since the latest MS update. Note: this doesn't apply to Boinc V 5.3.31 and higher On my 5.4.8 I see the following using netstat -ao: TCP myputername:1043 myputername:31416 Established 3768 TCP myputername:31416 myputername::0 Listening 3928 TCP myputername:31416 myputername:1043 Established 3928 see how it loops back between 31416 and 1043? and is listening on 31416? On the affected puters, this looping doesn't show up |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
you could open a command prompt and enter "netstat -a" or "netstat -ao" and see what's grabbed port 1043. It should be a loopback to some other random port and 127.0.0.1 on 1043. Unfortunately the MS product is grabbing the port in boot and this stops boinc. Installing as service has helped some. A cold boot (not just restart) has helped others. There has been a rash of this issues since the latest MS update. You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then? (mine does start properly) And once BOINC does get started, I thought it was not possible to steal the port away from an application that's already bound to it. This is more what seems to be happening if indeed another application is found using the port. I'll have to do the -ao next time it happens and check the process IDs against the task manager. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then? this I cannot explain, since I don't know how MS does it's updates. It might do a refresh or something, I just don't know. What I do know is that on all the people I've helped with this issue, All don't see this loopback. By the way, when classic closed down I added more than 2 thousand posts at the Q&A board over there, so when I say "all" it's a considerable amount of users. the boinc manager uses one port, the daemon the other and this is the only way they speak to eachother. Since it loopsback, firewalls ask for permission for these, but this loop back communcation never actually leaves your puter. |
onemacguy Send message Joined: 10 Nov 05 Posts: 12 Credit: 2,564,700 RAC: 0 |
You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then? I am not running a firewall of any type. My dual Xeon box was not having an issue getting new work, the boinc manager had lost connection with the client, so Boinc was not running. I usually have to quit the Boinc Manager and restart it to get it back to crunching. Seems to happen more on my dual processor boxes. My computers are hidden because my employer asked me to hide the ones I use at work. I am using the crunch3r client for most of these computers though. :) I have both Macs and Windows crunchers, the Windows ones are the only ones giving me trouble, as usual..... :) |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I am using the crunch3r client for most of these computers though. the latest Crunch3r core clients use 5.2.13 as a base, this means it (and earlier versions) still use port 1043, instead of 31416 which came out in 5.3.31 (I'm starting to wonder if it was 5.3.17, but anyway). When the new "recommended" release comes out it should fix your issue. I know Crunch3r has been working on a new version that supports the new Seti Enhanced applications, so it shouldn't be long. tony |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I am using the crunch3r client for most of these computers though. Especially as they're releaseing seti-enhanced across seti slowly at the moment :-) I still not sure why they are not releaseing 5.4.x as they work well and they're chasing some bugs that don't effect many people (or so it seems that's what they're doing). I would rather them release it now, fix this LARGE bug and the others alterations, then fix the little bugs after, since they're already working with the people that have the bugs. I think it's a screen saver bug (caused by firewalls) and a proxy bug/problem (which I have :( and although I've sent a repot to the alpha mailing list it never seemed to get there why they cannot use a forum instead of the list I don't know, they give just the same mailing features and restrictions you have on the lists. Plus it would make it much easier to follows what comes where and who responds to what.... err RANT off, sorry must be the Sunny weather. But give 5.4.8 a go on that dual, your benchamrks will be back to normal but at least it should work. Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Just for the information Why I say there is not much reason in hiding computer, even for work one is that people cannot see a lot about them Just OS, CPU and benchmarks really i.e. This much https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=212252 and https://boinc.bakerlab.org/rosetta/hosts_user.php?userid=2322 That's pretty muich it. Nothing to even remotely say where the computer is or anything. You can see a lot more about your own computers, e.g. IP address, names of them etc but we cannot. Team mauisun.org |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
You can see a lot more about your own computers, e.g. IP address, names of them etc but we cannot. Great point. And another point is just that "show" doesn't mean your machine is in anyway exposed on the internet by your selection. It doesn't fire up a server or anything to "show" it. Mostly helps people see your basic configuration (whether it's undersized, slow, old, which operating system etc.), and your WUs which shows your errors and successes, and give a feel for your typical crunch time. I basically helps provide a lot of the information that people generally neglect to post with their question. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
Great point. And another point is just that "show" doesn't mean your machine is in anyway exposed on the internet by your selection. It doesn't fire up a server or anything to "show" it. So, being cautious I marked my account as 'NO SHOW' when I joined. Now I've been thinking about changing it, but I can't find the control button. Where is it? dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
here it is |
Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0 |
I have boinc installed as a service. I already had the case, that suddenly I noticed that the service wasn't running anymore...How can he stop by itself? Member of Dutch Power Cows |
onemacguy Send message Joined: 10 Nov 05 Posts: 12 Credit: 2,564,700 RAC: 0 |
Just for the information OK, I unhid my computers based o what you guys have provided for feedback. Hope this helps. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I have boinc installed as a service. No idea (and you should start another thread if you need to find out) but check your EventLog to see any error information. but try this to see if the service actually starts start-run type cmd now type (in the new window) sc start boinc that sarts the service called boinc. (note: sc stop boinc stops it which is a good way to control it and you only need to type it in the run dialog box, but using the command prompt (cmd) you can see what it is doing) Team mauisun.org |
Message boards :
Number crunching :
Losing Connection from Boinc Manager to Client
©2024 University of Washington
https://www.bakerlab.org