Losing Connection from Boinc Manager to Client

Message boards : Number crunching : Losing Connection from Boinc Manager to Client

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile onemacguy

Send message
Joined: 10 Nov 05
Posts: 12
Credit: 2,564,700
RAC: 0
Message 15489 - Posted: 4 May 2006, 10:56:51 UTC

I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost. I have Boinc running as a service under anAdmin account. I lost 3 days on one of my top machines because of this. This is ridiculous! When are you gonna fix this? Between this and there being no timeout on stuck workunits, I lose thousands of points per day. I have to monitor my farm EVERY DAY to check and see if ebery machine is reporting. I should not have to do this.

Do ALL of the Boinc projects have these problems? Folding@Home did not do this to me. When can I expect a stable client. I am almosy ready to switch to a new project after find my new Dual 3.8 Xeon workstation no working for two days.
ID: 15489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15492 - Posted: 4 May 2006, 11:20:57 UTC - in response to Message 15489.  

I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost. I have Boinc running as a service under anAdmin account. I lost 3 days on one of my top machines because of this. This is ridiculous! When are you gonna fix this? Between this and there being no timeout on stuck workunits, I lose thousands of points per day. I have to monitor my farm EVERY DAY to check and see if ebery machine is reporting. I should not have to do this.

Do ALL of the Boinc projects have these problems? Folding@Home did not do this to me. When can I expect a stable client. I am almosy ready to switch to a new project after find my new Dual 3.8 Xeon workstation no working for two days.



Rosetta, does time out stuck workunits, at least if they are using the newer fixe Rosetta 5.07. Of course older jobs may still use an older version until they are finished.

The disconnection from BoincManager is a BOINC issue, Rosetta have littel control over that.
But factor that effect it are, something else is using the the PORT BOINC uses. Often a Microsoft program does this (and microsoft should not be using that port)
So later then 5.2.13 clients have this fixed as they now use a different port by default. But they are still classed as development versions (though the BBC use one of the development versions for there BBC-CCE project.)

Hopefully an official version should be released soon (week or so).


Though neither of these should effect your Dual Xeon from getting new work.
Maybe a firewall is blocking boinc.exe ?

I say this as BOINC Manager has no effect on you ability to get new tasks(work), it never actualy needs to be running. 'boinc' (boinc.exe for windows) does all the hard work.


Team mauisun.org
ID: 15492 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15493 - Posted: 4 May 2006, 11:21:38 UTC
Last modified: 4 May 2006, 11:23:59 UTC

Since you've hidden your computers (wich is a pain when trying to help) that's about all I can say at the moment.

(The only reason I cann assume for hidiing them is so people don't see your using superbech (mac) or Crunch3r/Trux (win) optimised client to inflate benchmarks.... like pretty much everyone does ;-))
Team mauisun.org
ID: 15493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15494 - Posted: 4 May 2006, 11:26:24 UTC - in response to Message 15489.  
Last modified: 4 May 2006, 11:27:09 UTC

I am almosy ready to switch to a new project after find my new Dual 3.8 Xeon workstation no working for two days.

Are you a Windows user? Which Boinc core client are you using?
ID: 15494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15495 - Posted: 4 May 2006, 11:29:39 UTC

Yes, Fluffy chicken, this is most likely a "boinc" issue. I've seen many reports of this since the recent MS update came out.
ID: 15495 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15497 - Posted: 4 May 2006, 11:35:43 UTC

http://boinc.berkeley.edu/download.php?min_version=5.0&dev=1

They've always seemed to work fine, so give one a go :-)


Note, you will probably loose tha ability to use Crunch3rs client and you'll need to use Truxoft.

Crunch3rs client is based on the older (current recommended 5.2.13) version and is effected by the port problem.
Team mauisun.org
ID: 15497 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15508 - Posted: 4 May 2006, 15:30:14 UTC - in response to Message 15489.  

I have seen an increasing number of machines stalled with the Boinc Manager disconnected from LocalHost.

I see this as well. In my case, Windows, singler user installation. The manager just loses contact with the running threads, and so no data shows in the tabs at all. And so you can't see the projects to do an update to get any work, etc.

It seems to occur only after running for several days without incident, and so I tend to think there's a bug in the Windows TCP stack rather than directly trying to use the point which I know BOINC HAD been attached to when it was started originally.

I basically have to reboot to get TCP working again. If I recall, once this occurs, my browser doesn't even work. Any suggestions for how to study it further when it fails again?

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15509 - Posted: 4 May 2006, 15:36:31 UTC
Last modified: 4 May 2006, 15:44:48 UTC

you could open a command prompt and enter "netstat -a" or "netstat -ao" and see what's grabbed port 1043. It should be a loopback to some other random port and 127.0.0.1 on 1043. Unfortunately the MS product is grabbing the port in boot and this stops boinc. Installing as service has helped some. A cold boot (not just restart) has helped others. There has been a rash of this issues since the latest MS update.

Note: this doesn't apply to Boinc V 5.3.31 and higher

On my 5.4.8 I see the following using netstat -ao:


TCP myputername:1043 myputername:31416 Established 3768
TCP myputername:31416 myputername::0 Listening 3928
TCP myputername:31416 myputername:1043 Established 3928

see how it loops back between 31416 and 1043? and is listening on 31416?

On the affected puters, this looping doesn't show up
ID: 15509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15511 - Posted: 4 May 2006, 15:46:37 UTC - in response to Message 15509.  

you could open a command prompt and enter "netstat -a" or "netstat -ao" and see what's grabbed port 1043. It should be a loopback to some other random port and 127.0.0.1 on 1043. Unfortunately the MS product is grabbing the port in boot and this stops boinc. Installing as service has helped some. A cold boot (not just restart) has helped others. There has been a rash of this issues since the latest MS update.

Note: this doesn't apply to Boinc V 5.3.31 and higher


You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then? (mine does start properly) And once BOINC does get started, I thought it was not possible to steal the port away from an application that's already bound to it. This is more what seems to be happening if indeed another application is found using the port.

I'll have to do the -ao next time it happens and check the process IDs against the task manager.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15511 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15513 - Posted: 4 May 2006, 15:58:01 UTC - in response to Message 15511.  

You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then?

this I cannot explain, since I don't know how MS does it's updates. It might do a refresh or something, I just don't know. What I do know is that on all the people I've helped with this issue, All don't see this loopback. By the way, when classic closed down I added more than 2 thousand posts at the Q&A board over there, so when I say "all" it's a considerable amount of users.

the boinc manager uses one port, the daemon the other and this is the only way they speak to eachother. Since it loopsback, firewalls ask for permission for these, but this loop back communcation never actually leaves your puter.



ID: 15513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile onemacguy

Send message
Joined: 10 Nov 05
Posts: 12
Credit: 2,564,700
RAC: 0
Message 15518 - Posted: 4 May 2006, 16:51:05 UTC - in response to Message 15513.  
Last modified: 4 May 2006, 16:52:08 UTC

You say MS is grabbing the port during bootup, but then how would BOINC ever get started properly then?

this I cannot explain, since I don't know how MS does it's updates. It might do a refresh or something, I just don't know. What I do know is that on all the people I've helped with this issue, All don't see this loopback. By the way, when classic closed down I added more than 2 thousand posts at the Q&A board over there, so when I say "all" it's a considerable amount of users.

the boinc manager uses one port, the daemon the other and this is the only way they speak to eachother. Since it loopsback, firewalls ask for permission for these, but this loop back communcation never actually leaves your puter.


I am not running a firewall of any type.

My dual Xeon box was not having an issue getting new work, the boinc manager had lost connection with the client, so Boinc was not running. I usually have to quit the Boinc Manager and restart it to get it back to crunching. Seems to happen more on my dual processor boxes.

My computers are hidden because my employer asked me to hide the ones I use at work. I am using the crunch3r client for most of these computers though. :)

I have both Macs and Windows crunchers, the Windows ones are the only ones giving me trouble, as usual..... :)

ID: 15518 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15520 - Posted: 4 May 2006, 16:58:49 UTC - in response to Message 15518.  

I am using the crunch3r client for most of these computers though.

the latest Crunch3r core clients use 5.2.13 as a base, this means it (and earlier versions) still use port 1043, instead of 31416 which came out in 5.3.31 (I'm starting to wonder if it was 5.3.17, but anyway). When the new "recommended" release comes out it should fix your issue. I know Crunch3r has been working on a new version that supports the new Seti Enhanced applications, so it shouldn't be long.

tony
ID: 15520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15524 - Posted: 4 May 2006, 18:27:25 UTC - in response to Message 15520.  

I am using the crunch3r client for most of these computers though.

the latest Crunch3r core clients use 5.2.13 as a base, this means it (and earlier versions) still use port 1043, instead of 31416 which came out in 5.3.31 (I'm starting to wonder if it was 5.3.17, but anyway). When the new "recommended" release comes out it should fix your issue. I know Crunch3r has been working on a new version that supports the new Seti Enhanced applications, so it shouldn't be long.

tony


Especially as they're releaseing seti-enhanced across seti slowly at the moment :-)


I still not sure why they are not releaseing 5.4.x as they work well and they're chasing some bugs that don't effect many people (or so it seems that's what they're doing).

I would rather them release it now, fix this LARGE bug and the others alterations, then fix the little bugs after, since they're already working with the people that have the bugs.
I think it's a screen saver bug (caused by firewalls) and a proxy bug/problem (which I have :( and although I've sent a repot to the alpha mailing list it never seemed to get there why they cannot use a forum instead of the list I don't know, they give just the same mailing features and restrictions you have on the lists.
Plus it would make it much easier to follows what comes where and who responds to what....


err RANT off, sorry must be the Sunny weather.


But give 5.4.8 a go on that dual, your benchamrks will be back to normal but at least it should work.


Team mauisun.org
ID: 15524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15525 - Posted: 4 May 2006, 18:34:08 UTC
Last modified: 4 May 2006, 18:36:27 UTC

Just for the information

Why I say there is not much reason in hiding computer, even for work one is that people cannot see a lot about them

Just OS, CPU and benchmarks really


i.e. This much
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=212252
and
https://boinc.bakerlab.org/rosetta/hosts_user.php?userid=2322

That's pretty muich it.

Nothing to even remotely say where the computer is or anything.


You can see a lot more about your own computers, e.g. IP address, names of them etc but we cannot.

Team mauisun.org
ID: 15525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15528 - Posted: 4 May 2006, 18:56:42 UTC - in response to Message 15525.  

You can see a lot more about your own computers, e.g. IP address, names of them etc but we cannot.

Great point. And another point is just that "show" doesn't mean your machine is in anyway exposed on the internet by your selection. It doesn't fire up a server or anything to "show" it. Mostly helps people see your basic configuration (whether it's undersized, slow, old, which operating system etc.), and your WUs which shows your errors and successes, and give a feel for your typical crunch time. I basically helps provide a lot of the information that people generally neglect to post with their question.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 15533 - Posted: 4 May 2006, 20:46:17 UTC - in response to Message 15528.  
Last modified: 4 May 2006, 20:46:41 UTC

Great point. And another point is just that "show" doesn't mean your machine is in anyway exposed on the internet by your selection. It doesn't fire up a server or anything to "show" it.


So, being cautious I marked my account as 'NO SHOW' when I joined. Now I've been thinking about changing it, but I can't find the control button. Where is it?
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 15533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15536 - Posted: 4 May 2006, 21:39:04 UTC

here it is
ID: 15536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cureseekers~Kristof

Send message
Joined: 5 Nov 05
Posts: 80
Credit: 689,603
RAC: 0
Message 15552 - Posted: 5 May 2006, 5:32:27 UTC

I have boinc installed as a service.
I already had the case, that suddenly I noticed that the service wasn't running anymore...How can he stop by itself?
Member of Dutch Power Cows
ID: 15552 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile onemacguy

Send message
Joined: 10 Nov 05
Posts: 12
Credit: 2,564,700
RAC: 0
Message 15554 - Posted: 5 May 2006, 5:43:07 UTC - in response to Message 15525.  

Just for the information

Why I say there is not much reason in hiding computer, even for work one is that people cannot see a lot about them

Just OS, CPU and benchmarks really


i.e. This much
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=212252
and
https://boinc.bakerlab.org/rosetta/hosts_user.php?userid=2322

That's pretty muich it.

Nothing to even remotely say where the computer is or anything.


You can see a lot more about your own computers, e.g. IP address, names of them etc but we cannot.

OK, I unhid my computers based o what you guys have provided for feedback. Hope this helps.

ID: 15554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 15574 - Posted: 5 May 2006, 14:41:11 UTC - in response to Message 15552.  

I have boinc installed as a service.
I already had the case, that suddenly I noticed that the service wasn't running anymore...How can he stop by itself?


No idea (and you should start another thread if you need to find out)
but check your EventLog to see any error information.

but try this to see if the service actually starts
start-run
type cmd
now type (in the new window) sc start boinc
that sarts the service called boinc.
(note: sc stop boinc stops it which is a good way to control it and you only need to type it in the run dialog box, but using the command prompt (cmd) you can see what it is doing)
Team mauisun.org
ID: 15574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Losing Connection from Boinc Manager to Client



©2024 University of Washington
https://www.bakerlab.org