Rosetta Process Stalls

Message boards : Number crunching : Rosetta Process Stalls

To post messages, you must log in.

AuthorMessage
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34335 - Posted: 8 Jan 2007, 1:24:51 UTC

Seems to have just started doing this a week or two ago. New PC:

CPU: Intel Pentium D 915 2800MHz @ 2800MHz
Motherboard: Intel D945Gpm
Memory: 1024 MB of Corsair DDR2-667
PS: OCZ Modstream
Video Card: ATI X700 Pro 256MB Radeon
Hard Drive: Seagate 7200.10 320.0 GB @ 7200 RPMS
OS: XP Pro with all updates

The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again.

Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly.

Any ideas as to why the Rosetta processes are stalling?



ID: 34335 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 34356 - Posted: 8 Jan 2007, 14:38:57 UTC

Which BOINC version are you running?
Rosetta Moderator: Mod.Sense
ID: 34356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34367 - Posted: 8 Jan 2007, 16:02:54 UTC - in response to Message 34356.  

Which BOINC version are you running?


Whatever the latest version is. Last night I tried a simple uninstall/reinstall, but it didn't help. Last I checked both processes were "running" but stalled.
ID: 34367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 34370 - Posted: 8 Jan 2007, 16:22:45 UTC

Looks like most of your hosts are 5.4.11. But This one is 5.2.11. Similar BOINC issues have been reported. It might be helpful if you could identify the specific host. BOINC manager seems to lose contact with the running threads. And seems to not detect when they end (generally with a no heartbeat indication) to schedule more tasks.
Rosetta Moderator: Mod.Sense
ID: 34370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 34377 - Posted: 8 Jan 2007, 18:22:15 UTC - in response to Message 34335.  

Seems to have just started doing this a week or two ago. New PC:

CPU: Intel Pentium D 915 2800MHz @ 2800MHz
Motherboard: Intel D945Gpm
Memory: 1024 MB of Corsair DDR2-667
PS: OCZ Modstream
Video Card: ATI X700 Pro 256MB Radeon
Hard Drive: Seagate 7200.10 320.0 GB @ 7200 RPMS
OS: XP Pro with all updates

The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again.

Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly.

Any ideas as to why the Rosetta processes are stalling?






I have the same problem on one of my machines and it appears to only affect my dual core Pentium D processor. I have suspended Rossetta on that machine
ID: 34377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34379 - Posted: 8 Jan 2007, 19:03:35 UTC - in response to Message 34370.  
Last modified: 8 Jan 2007, 19:07:11 UTC

Looks like most of your hosts are 5.4.11. But This one is 5.2.11. Similar BOINC issues have been reported. It might be helpful if you could identify the specific host. BOINC manager seems to lose contact with the running threads. And seems to not detect when they end (generally with a no heartbeat indication) to schedule more tasks.


This is the machine I'm having the problem with.

Tonight I'll check the core temps with 100% load using Intel TAT to make sure it's not throttling. I'd be surprized because the regular Intel temp monitor utility shows load temps at about 70°C per core. The P4/PD CPU's don't normally start throttling until about 85+°C, and I'm way below that.....but one never knows.

Looking over the results, there's a ton of "compute errors". Guess that could be the root of the problem? Maybe I need to run some dianostics on that PC to check for a bad CPU or bad RAM modules?
ID: 34379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 34387 - Posted: 8 Jan 2007, 20:58:58 UTC

All -107 exit codes it looks like. This is one of the primary symptom of PCs that are having problems running with BOINC/Rosetta as the screensaver... yet you said you aren't using the screensaver. Do you display the graphics? (if so, it can be prone to the same problems as the screensaver).

Interesting, that host shows significantly more floating point ops per second then integer.
Measured floating point speed 1411.95 million ops/sec
Measured integer speed 1189.46 million ops/sec

Also interesting, when a task completes normally, you are granted roughly double the credit you claim.

Please let us know the result of your tests tonight.
Rosetta Moderator: Mod.Sense
ID: 34387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 12 Feb 06
Posts: 111
Credit: 241,579
RAC: 0
Message 34388 - Posted: 8 Jan 2007, 20:59:07 UTC - in response to Message 34335.  

The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again.

Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly.

Any ideas as to why the Rosetta processes are stalling?

This is a problem in the BOINC API. David Anderson has debugged this problem and written a fix for this problem, which is inlcuded in the BOINC API 5.8.0. This fix needs a recompile of the science application.
This could also happen with BOINC 5.4.x, but mostly seen if CPU throtteling was used (BOINC client 5.6.x/5.7.x and 5.8.x).
ID: 34388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34409 - Posted: 9 Jan 2007, 4:04:28 UTC - in response to Message 34387.  

All -107 exit codes it looks like. This is one of the primary symptom of PCs that are having problems running with BOINC/Rosetta as the screensaver... yet you said you aren't using the screensaver. Do you display the graphics? (if so, it can be prone to the same problems as the screensaver).

Interesting, that host shows significantly more floating point ops per second then integer.
Measured floating point speed 1411.95 million ops/sec
Measured integer speed 1189.46 million ops/sec

Also interesting, when a task completes normally, you are granted roughly double the credit you claim.

Please let us know the result of your tests tonight.


To be honest, I never looked at it that closely. I have a somewhat large number of PC's running the project and micro-management is time-comsuming.

When I installed the client, I (as always) uncheck the "set as screensaver" option. On this PC, I just run the normal client3.

The Intel TAT tool will not run on a PD processor, so all I have it the regular Intel utility. But my load temps are still fine, about 63°C with two processes of R@H running. I backed the RAM speed down to 533 from 667 for now, even though the ram is rated at 667MHz. I'll see if that has any effect.
ID: 34409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34410 - Posted: 9 Jan 2007, 4:06:43 UTC - in response to Message 34388.  

The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again.

Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly.

Any ideas as to why the Rosetta processes are stalling?

This is a problem in the BOINC API. David Anderson has debugged this problem and written a fix for this problem, which is inlcuded in the BOINC API 5.8.0. This fix needs a recompile of the science application.
This could also happen with BOINC 5.4.x, but mostly seen if CPU throtteling was used (BOINC client 5.6.x/5.7.x and 5.8.x).


Hummm...that's interesting, thanks. I seem to only have this problem on one PC, which is a bit strange. I'm wondering if it may have something to do with the ATI video driver? That's the most unique characteristic on this PC compared to my others.
ID: 34410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 12 Feb 06
Posts: 111
Credit: 241,579
RAC: 0
Message 34417 - Posted: 9 Jan 2007, 8:21:00 UTC - in response to Message 34410.  
Last modified: 9 Jan 2007, 8:21:52 UTC

Any ideas as to why the Rosetta processes are stalling?

This is a problem in the BOINC API. David Anderson has debugged this problem and written a fix for this problem, which is inlcuded in the BOINC API 5.8.0. This fix needs a recompile of the science application.
This could also happen with BOINC 5.4.x, but mostly seen if CPU throtteling was used (BOINC client 5.6.x/5.7.x and 5.8.x).


Hummm...that's interesting, thanks. I seem to only have this problem on one PC, which is a bit strange. I'm wondering if it may have something to do with the ATI video driver? That's the most unique characteristic on this PC compared to my others.

No.
The problem occurs only one sometimes on some hosts on some different projects. But on some hosts and projects it happens more often. Mostly, if the science app will be to often suspended.
To include the fix of the BOINC API, the science application must be recompiled.
ID: 34417 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34431 - Posted: 9 Jan 2007, 14:46:48 UTC - in response to Message 34417.  

Any ideas as to why the Rosetta processes are stalling?

This is a problem in the BOINC API. David Anderson has debugged this problem and written a fix for this problem, which is inlcuded in the BOINC API 5.8.0. This fix needs a recompile of the science application.
This could also happen with BOINC 5.4.x, but mostly seen if CPU throtteling was used (BOINC client 5.6.x/5.7.x and 5.8.x).


Hummm...that's interesting, thanks. I seem to only have this problem on one PC, which is a bit strange. I'm wondering if it may have something to do with the ATI video driver? That's the most unique characteristic on this PC compared to my others.

No.
The problem occurs only one sometimes on some hosts on some different projects. But on some hosts and projects it happens more often. Mostly, if the science app will be to often suspended.
To include the fix of the BOINC API, the science application must be recompiled.


Thanks! This will happen soon hopefully.

ID: 34431 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34537 - Posted: 11 Jan 2007, 16:53:42 UTC
Last modified: 11 Jan 2007, 16:54:40 UTC

OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold.

I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol.

The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D!

Thanks to all that replied!
ID: 34537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 34573 - Posted: 12 Jan 2007, 14:01:40 UTC - in response to Message 34537.  

OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold.

I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol.

The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D!

Thanks to all that replied!




I have a Pentium D 3Gigrunning 24/7 never gets above 35c

Doesn't run Rossetta at the moment because it stalls.

ID: 34573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 34575 - Posted: 12 Jan 2007, 14:43:16 UTC - in response to Message 34573.  
Last modified: 12 Jan 2007, 14:43:50 UTC

OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold.

I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol.

The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D!

Thanks to all that replied!




I have a Pentium D 3Gigrunning 24/7 never gets above 35c

Doesn't run Rossetta at the moment because it stalls.


Sure, idle temp is fairly low, my system idles around 40°C. At least now you know why it stalls.

I'm testing a little program I found on the web (since last night) that controls the throttling. I'm not letting the CPU throttle! It doesn't need to throttle until about 80-85°C, and I'm not running that hot. Last I have checked the system it was running 68-72°C under R@H. If this program works, I'll be happy.
ID: 34575 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 34651 - Posted: 13 Jan 2007, 17:43:38 UTC - in response to Message 34575.  

OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold.

I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol.

The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D!

Thanks to all that replied!




I have a Pentium D 3Gigrunning 24/7 never gets above 35c

Doesn't run Rossetta at the moment because it stalls.


Sure, idle temp is fairly low, my system idles around 40°C. At least now you know why it stalls.

I'm testing a little program I found on the web (since last night) that controls the throttling. I'm not letting the CPU throttle! It doesn't need to throttle until about 80-85°C, and I'm not running that hot. Last I have checked the system it was running 68-72°C under R@H. If this program works, I'll be happy.



It runs at 35c working on other projects, it's not idle, it's at 100%

ID: 34651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BadThad

Send message
Joined: 8 Nov 05
Posts: 30
Credit: 71,834,523
RAC: 0
Message 35088 - Posted: 19 Jan 2007, 20:29:04 UTC - in response to Message 34651.  

OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold.

I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol.

The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D!

Thanks to all that replied!




I have a Pentium D 3Gigrunning 24/7 never gets above 35c

Doesn't run Rossetta at the moment because it stalls.


Sure, idle temp is fairly low, my system idles around 40°C. At least now you know why it stalls.

I'm testing a little program I found on the web (since last night) that controls the throttling. I'm not letting the CPU throttle! It doesn't need to throttle until about 80-85°C, and I'm not running that hot. Last I have checked the system it was running 68-72°C under R@H. If this program works, I'll be happy.



It runs at 35c working on other projects, it's not idle, it's at 100%


Wow, 35°C with 100% CPU load? That's unreal!

Well, I've given up on this PC for my daughter, it's just too hot. I've taken the Pentium D and put it in a new case with an Artic Freezer cooler. Just got this going last night, in the new setup it's running at 35°C idle and max of 48°C with Rosetta. I've decided to make this PC nothing but a full-time cruncher in my herd of PC's. Good news for my daughter, I'm putting a Gigabyte DS3 with a C2D E6300 into her case.....should take care of my temperature problems with her case for good.
ID: 35088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Rosetta Process Stalls



©2024 University of Washington
https://www.bakerlab.org