Message boards : Number crunching : Beyond newbie Q&A
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I'm not new to BOINC as I have about 750,000 units completed on SETI. There are bugs in the program, if your running 1.82 tasks be sure to list the task in the 1.82 thread and include the error code and message. Computation errors happen randomly, so you just got unlucky. |
David Send message Joined: 22 Nov 05 Posts: 4 Credit: 11,853,048 RAC: 0 |
I don't know where else to go for help. Rosetta has stopped downloading new tasks--for over a week now. I have reset the project to no avail. Communication always gets deferred. I have left it alone for days but it has not started working again. What's going on? Anybody know? Boinc version 6.6.36 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I don't know where else to go for help. Rosetta has stopped downloading new tasks--for over a week now. I have reset the project to no avail. Communication always gets deferred. I have left it alone for days but it has not started working again. What's going on? Anybody know? Boinc version 6.6.36 Allot of us are seeing scheduler errors with 6.6.x, so try downloading and installing 6.4.7 instead and then reset the project and see if you get new tasks. |
Dom Send message Joined: 2 Aug 10 Posts: 14 Credit: 187,991 RAC: 0 |
As newbie around here I hope somebody can help me with my Client error / Compute error problem. I don't want to wasting time on crunching numbers just to get errors back days later. I have around 20 errors in 10 days (approx). I have run Prime95 for 10 hours with no problems. Can anybody look at my results and find out what is going on. My Results Thank You |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like Dom is getting "Unhandled Exception Detected" errors after 1+ hours of computation. If memory stress tests are passing, that is a good sign. But if you are doing any overclocking, you should always see if stock speeds improve your error rate. It is also possible that such errors can be produced by a bug in the application, however I note that there do not seem to be similar error rates being reported by others, so this tends to point to something in your environment. Rosetta Moderator: Mod.Sense |
Dom Send message Joined: 2 Aug 10 Posts: 14 Credit: 187,991 RAC: 0 |
The Pc is not overcloaked its just an off the shelf games pc that I no longer have time to play games on :) Is it worth trying to update my version of Bonic for example? As far as I know I have all the updates for my hardware and windows 7. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
No, BOINC did not encounter the exception, the Rosetta application did. So I would not expect a different BOINC version to effect the outcome. Rosetta Moderator: Mod.Sense |
Dom Send message Joined: 2 Aug 10 Posts: 14 Credit: 187,991 RAC: 0 |
Well I am open to ideas & suggestions. I dont want to waste days of wasted time. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So the test to try would be to set the BOINC preferences to only use say 50% of CPU for a while and see if your results improve. This will tell BOINC to only run 5 seconds out of every 10 and thus allow some idle CPU time for things to cool a bit. Rosetta Moderator: Mod.Sense |
Dom Send message Joined: 2 Aug 10 Posts: 14 Credit: 187,991 RAC: 0 |
Since I started this project 18 days ago (approx) I have now had over 30 Client error/Compute error. I have run prime95 for hours with no problem. I have monitored the heat on my CPUs and don't have any problems. I have slowed down the proccesing to 75% and 50% and I am still getting the same errors. If nobody can come up with a solution I will have to abandon this project. :( |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
The fact that your 'wingmen' encounter no problems with the same tasks points, IMO, to an issue with your computer. Although the chance is it is involved with this issue is slight - are he and his wingman running on the same type of system? For example: Linux vs Windows - it is possible that he is touching on a fault in a system library that his "wingman" does not have to deal with ... |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
It might be a driver issue, but before going into this, I would intensively check the memory. Memtest86 should do, or Prime95. How did you run Prime95? Did you run the stress tests? For how long exactly and which ones? I remember having problems on one of my machines, 2 or 3 years ago. I had to run Prime95 for almost 16 hours, before it reported a faulty memory module. |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Jochen reported ... I had to run Prime95 for almost 16 hours, before it reported a faulty memory module Damn, that's almost like water boarding - run it long enough and you'll get the answer you want! |
Dom Send message Joined: 2 Aug 10 Posts: 14 Credit: 187,991 RAC: 0 |
I ran all the Prime95 tests for 10 hours each with no problems. I will try each test for longer and see if I can track down the problem. |
Jochen Send message Joined: 6 Jun 06 Posts: 133 Credit: 3,847,433 RAC: 0 |
I ran all the Prime95 tests for 10 hours each with no problems. I would recommend to run only the one with the most RAM usage. CPU errors are usually detected faster and a 10 hours test for the CPU should do. |
joseps Send message Joined: 25 Jun 06 Posts: 72 Credit: 8,173,820 RAC: 0 |
Has Rosetta@Home ever consider doing research for cure or prevention of Type 2 Diabetes? It's a greater threat to the ageing population, just like Alzheimer's disease. I hope David Baker will consider this as part of DC PROJECTS. More daily news make diabetes standout among diseases concern. The other day they called India the diabetes country in the world. Just curious and concerned. joseps I turned off my 5computers when I went on vacation. When I return today, I can not upload work. Need work units to run computers. joseps |
tomba Send message Joined: 29 May 06 Posts: 43 Credit: 1,558,972 RAC: 0 |
I stopped running Rosetta many months ago when I bought an i7. Problem was that he grabbed all eight CPUs and the fan noise was unacceptable. I did try Tthrottle but that affected all CPUs and my normal work was downgraded significantly. Is there a way to assign n CPUs to Rosetta where n = Fan Noise OK? Thanks, Tom |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,526,853 RAC: 6,993 |
I stopped running Rosetta many months ago when I bought an i7. Problem was that he grabbed all eight CPUs and the fan noise was unacceptable. I did try Tthrottle but that affected all CPUs and my normal work was downgraded significantly. I don't know of a way to manage it that smartly against temperature. You can force BOINC to use 4 processors and see if that is acceptable. A bigger heatsink would of course be another option, but then that isn't free and isn't an option if it's a laptop. |
tomba Send message Joined: 29 May 06 Posts: 43 Credit: 1,558,972 RAC: 0 |
You can force BOINC to use 4 processors and see if that is acceptable. BINGO!! That's exactly what I wanted: I set 25% of the processors === two === and Task Manager tells me three are in use; two for Rosetta and one for GPUGRID. And there's no delay in normal work. Perfect solution! Many thanks, Tom |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, so the one for GPUGrid is a small fraction of one CPU. The bulk of the work would be running on the GPU, not the CPU. You can probably run somewhere between 5 & 7 CPUs and still have modest fan speed. Basically we're just saying that it is an exercise left up to you to determine how many CPUs brings you to acceptable fan noise. Only takes a few minutes after adjustment to find out. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Beyond newbie Q&A
©2024 University of Washington
https://www.bakerlab.org