Persistent Compute Errors

Message boards : Number crunching : Persistent Compute Errors

To post messages, you must log in.

AuthorMessage
tomba

Send message
Joined: 29 May 06
Posts: 43
Credit: 1,558,972
RAC: 0
Message 63193 - Posted: 7 Sep 2009, 18:05:51 UTC

One of my three Rosetta crunchers has not done useful work for two days. Every WU returns compute error. I've detached twice; no change.

I have GPUGRID WUs running without a problem.

BOINC 6.6.36, XP Pro.

Any thoughts?

Thanks, Tom
ID: 63193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 63199 - Posted: 7 Sep 2009, 23:55:44 UTC - in response to Message 63193.  

One of my three Rosetta crunchers has not done useful work for two days. Every WU returns compute error. I've detached twice; no change.

I have GPUGRID WUs running without a problem.

BOINC 6.6.36, XP Pro.

Any thoughts?

Thanks, Tom


did a little research and here is something that should help:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

can you check and make sure that step one is done?

I also saw some general windows chatter about memory problems (though i doubt this is the case since the other project is working ok.)
also some chatter about device drivers and Data execution prevention.
more about DEP and windows can be found here: http://www.updatexp.com/0xC0000005.html
ID: 63199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 63214 - Posted: 8 Sep 2009, 23:29:32 UTC - in response to Message 63202.  
Last modified: 8 Sep 2009, 23:29:51 UTC

Are you overclocked? That could cause problems.


i was looking at his stdout text and wondered that myself.
i know that einstein will take serious amounts of OC and not crash, but rosie is VERY particular about how much OC you do before she starts breaking plates and causing tasks to crash.
ID: 63214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tomba

Send message
Joined: 29 May 06
Posts: 43
Credit: 1,558,972
RAC: 0
Message 63391 - Posted: 18 Sep 2009, 15:08:03 UTC

Sorry I've been so long coming back on this one. About two minutes after my post I got a BSOD. Then another five minutes later. I rebooted. I got a BSOD in the middle of the boot! It went on and on. Time for action...

Microsoft's responses to the BSODs were exclusively about drivers. Since in the previous week I'd done a format/Windows reinstall, using the latest drivers from Dell, that had to be rubbish.

A year ago I bought four 1-gig sticks of Crucial's Ballistics RAM. I took out three sticks and ran for two days without a hint of a problem. I added another stick two days ago; no problems. Tomorrow I add a third...

Tom
ID: 63391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 932
Message 63399 - Posted: 19 Sep 2009, 12:08:54 UTC - in response to Message 63391.  

Sorry I've been so long coming back on this one. About two minutes after my post I got a BSOD. Then another five minutes later. I rebooted. I got a BSOD in the middle of the boot! It went on and on. Time for action...

Microsoft's responses to the BSODs were exclusively about drivers. Since in the previous week I'd done a format/Windows reinstall, using the latest drivers from Dell, that had to be rubbish.

A year ago I bought four 1-gig sticks of Crucial's Ballistics RAM. I took out three sticks and ran for two days without a hint of a problem. I added another stick two days ago; no problems. Tomorrow I add a third...
Tom


Here you can download MS's own memory tester
http://www.softpedia.com/get/Tweak/Memory-Tweak/Microsoft-Windows-Memory-Diagnostic.shtml

I ran it awhile back and it didn't find anything wrong with my memory on a machine I was having troubles with. I ended up reloading Boinc all together to get it fixed.

Some folks run Prime95
http://www.playtool.com/pages/prime95/prime95.html
This will test mainly your cpu but the cpu has to be fed by the memory so it tests it sorta kinda.
ID: 63399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Stacey Baird
Avatar

Send message
Joined: 11 Apr 06
Posts: 19
Credit: 74,745
RAC: 0
Message 63437 - Posted: 24 Sep 2009, 1:52:09 UTC

I too am having trouble with a 50 percent error rate. About every other run fails with a computation error, it seems. I always leave objects in memory when suspended.

I use WINXP3,32 bit, and boinc 6.10.6. I was using the earlier approved version but it too generated computation errors. I also use nvidia cuda and 3+gig memory running on an intel quad.


ID: 63437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 63439 - Posted: 24 Sep 2009, 8:31:31 UTC - in response to Message 63437.  

I too am having trouble with a 50 percent error rate. About every other run fails with a computation error, it seems. I always leave objects in memory when suspended.

I use WINXP3,32 bit, and boinc 6.10.6. I was using the earlier approved version but it too generated computation errors. I also use nvidia cuda and 3+gig memory running on an intel quad.



6.10.6 is not an official version. i tried using it and it kept knocking my modem driver offline and freezing my system. It does say in red letters it is a beta platform and could be unsteady.
6.6.36 is a steady program but it has scheduling issues from what the talk on the boards has been.
6.4.7 is rock steady, no scheduling problems and the only errors that show up are if the tasks themselves have code errors.

Another thing to consider is if you OC your cores. Rosie is very particular about how much you oc your cores. While other projects don't seem to care, Rosie does. Check these things and see if your errors go away.
ID: 63439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tomba

Send message
Joined: 29 May 06
Posts: 43
Credit: 1,558,972
RAC: 0
Message 63451 - Posted: 25 Sep 2009, 16:56:54 UTC - in response to Message 63391.  

Sorry I've been so long coming back on this one. About two minutes after my post I got a BSOD. Then another five minutes later. I rebooted. I got a BSOD in the middle of the boot! It went on and on. Time for action...

Microsoft's responses to the BSODs were exclusively about drivers. Since in the previous week I'd done a format/Windows reinstall, using the latest drivers from Dell, that had to be rubbish.

A year ago I bought four 1-gig sticks of Crucial's Ballistix RAM. I took out three sticks and ran for two days without a hint of a problem. I added another stick two days ago; no problems. Tomorrow I add a third...


The third/fourth produced five long beeps at power-on and nothing else. I downloaded the MS RAM checker. Two megs in slots 1 & 2 ran clean. The other two-of-four sticks in slots 1 & 2 ran clean too.

I was about to try sticks in slots 3 & 4 one more time when my torch revealed this laying over slot 3:



A 5/16" dust ball!

I carefully removed it, installed the 3rd & 4th sticks and I'm up and running.

Wow! Bizarre!!

Tom



ID: 63451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 63454 - Posted: 26 Sep 2009, 0:14:49 UTC - in response to Message 63451.  

Sorry I've been so long coming back on this one. About two minutes after my post I got a BSOD. Then another five minutes later. I rebooted. I got a BSOD in the middle of the boot! It went on and on. Time for action...

Microsoft's responses to the BSODs were exclusively about drivers. Since in the previous week I'd done a format/Windows reinstall, using the latest drivers from Dell, that had to be rubbish.

A year ago I bought four 1-gig sticks of Crucial's Ballistix RAM. I took out three sticks and ran for two days without a hint of a problem. I added another stick two days ago; no problems. Tomorrow I add a third...


The third/fourth produced five long beeps at power-on and nothing else. I downloaded the MS RAM checker. Two megs in slots 1 & 2 ran clean. The other two-of-four sticks in slots 1 & 2 ran clean too.

I was about to try sticks in slots 3 & 4 one more time when my torch revealed this laying over slot 3:



A 5/16" dust ball!

I carefully removed it, installed the 3rd & 4th sticks and I'm up and running.

Wow! Bizarre!!

Tom





the tribbles got you lmao
good to see your back online and working at warp speed
ID: 63454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Persistent Compute Errors



©2024 University of Washington
https://www.bakerlab.org