Beyond newbie Q&A

Message boards : Number crunching : Beyond newbie Q&A

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,710,566
RAC: 1,965
Message 62258 - Posted: 17 Jul 2009, 8:51:17 UTC - in response to Message 62256.  

I'm not new to BOINC as I have about 750,000 units completed on SETI.

I have noticed that since I joined Rosetta yesterday two units ran and then did not upload due to a computation error - is this normal?


There are bugs in the program, if your running 1.82 tasks be sure to list the task in the 1.82 thread and include the error code and message. Computation errors happen randomly, so you just got unlucky.
ID: 62258 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David

Send message
Joined: 22 Nov 05
Posts: 4
Credit: 11,713,725
RAC: 2,136
Message 62290 - Posted: 18 Jul 2009, 22:50:11 UTC

I don't know where else to go for help. Rosetta has stopped downloading new tasks--for over a week now. I have reset the project to no avail. Communication always gets deferred. I have left it alone for days but it has not started working again. What's going on? Anybody know? Boinc version 6.6.36
ID: 62290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,710,566
RAC: 1,965
Message 62294 - Posted: 19 Jul 2009, 7:23:45 UTC - in response to Message 62290.  

I don't know where else to go for help. Rosetta has stopped downloading new tasks--for over a week now. I have reset the project to no avail. Communication always gets deferred. I have left it alone for days but it has not started working again. What's going on? Anybody know? Boinc version 6.6.36


Allot of us are seeing scheduler errors with 6.6.x, so try downloading and installing 6.4.7 instead and then reset the project and see if you get new tasks.
ID: 62294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dom

Send message
Joined: 2 Aug 10
Posts: 14
Credit: 187,991
RAC: 0
Message 67211 - Posted: 14 Aug 2010, 10:43:22 UTC

As newbie around here I hope somebody can help me with my Client error / Compute error problem.

I don't want to wasting time on crunching numbers just to get errors back days later. I have around 20 errors in 10 days (approx).

I have run Prime95 for 10 hours with no problems.

Can anybody look at my results and find out what is going on.
My Results

Thank You
ID: 67211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67218 - Posted: 14 Aug 2010, 17:54:22 UTC

Looks like Dom is getting "Unhandled Exception Detected" errors after 1+ hours of computation. If memory stress tests are passing, that is a good sign. But if you are doing any overclocking, you should always see if stock speeds improve your error rate. It is also possible that such errors can be produced by a bug in the application, however I note that there do not seem to be similar error rates being reported by others, so this tends to point to something in your environment.
Rosetta Moderator: Mod.Sense
ID: 67218 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dom

Send message
Joined: 2 Aug 10
Posts: 14
Credit: 187,991
RAC: 0
Message 67219 - Posted: 14 Aug 2010, 19:25:03 UTC
Last modified: 14 Aug 2010, 19:27:18 UTC

The Pc is not overcloaked its just an off the shelf games pc that I no longer have time to play games on :)

Is it worth trying to update my version of Bonic for example?
As far as I know I have all the updates for my hardware and windows 7.
ID: 67219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67220 - Posted: 14 Aug 2010, 19:57:37 UTC

No, BOINC did not encounter the exception, the Rosetta application did. So I would not expect a different BOINC version to effect the outcome.
Rosetta Moderator: Mod.Sense
ID: 67220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dom

Send message
Joined: 2 Aug 10
Posts: 14
Credit: 187,991
RAC: 0
Message 67222 - Posted: 14 Aug 2010, 21:49:22 UTC

Well I am open to ideas & suggestions.

I dont want to waste days of wasted time.
ID: 67222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 67225 - Posted: 15 Aug 2010, 12:02:41 UTC
Last modified: 15 Aug 2010, 12:04:14 UTC

So the test to try would be to set the BOINC preferences to only use say 50% of CPU for a while and see if your results improve. This will tell BOINC to only run 5 seconds out of every 10 and thus allow some idle CPU time for things to cool a bit.
Rosetta Moderator: Mod.Sense
ID: 67225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dom

Send message
Joined: 2 Aug 10
Posts: 14
Credit: 187,991
RAC: 0
Message 67283 - Posted: 20 Aug 2010, 14:46:17 UTC
Last modified: 20 Aug 2010, 14:49:03 UTC

Since I started this project 18 days ago (approx) I have now had over 30 Client error/Compute error.

I have run prime95 for hours with no problem.
I have monitored the heat on my CPUs and don't have any problems.
I have slowed down the proccesing to 75% and 50% and I am still getting the same errors.

If nobody can come up with a solution I will have to abandon this project.

:(
ID: 67283 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67285 - Posted: 20 Aug 2010, 17:27:06 UTC

The fact that your 'wingmen' encounter no problems with the same tasks points, IMO, to an issue with your computer.


Although the chance is it is involved with this issue is slight - are he and his wingman running on the same type of system? For example: Linux vs Windows - it is possible that he is touching on a fault in a system library that his "wingman" does not have to deal with ...

ID: 67285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67286 - Posted: 20 Aug 2010, 18:20:11 UTC

It might be a driver issue, but before going into this, I would intensively check the memory. Memtest86 should do, or Prime95.
How did you run Prime95? Did you run the stress tests? For how long exactly and which ones?
I remember having problems on one of my machines, 2 or 3 years ago. I had to run Prime95 for almost 16 hours, before it reported a faulty memory module.
ID: 67286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 67287 - Posted: 20 Aug 2010, 18:46:03 UTC

Jochen reported ...

I had to run Prime95 for almost 16 hours, before it reported a faulty memory module


Damn, that's almost like water boarding - run it long enough and you'll get the answer you want!

ID: 67287 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dom

Send message
Joined: 2 Aug 10
Posts: 14
Credit: 187,991
RAC: 0
Message 67291 - Posted: 20 Aug 2010, 23:59:31 UTC

I ran all the Prime95 tests for 10 hours each with no problems.

I will try each test for longer and see if I can track down the problem.

ID: 67291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jochen

Send message
Joined: 6 Jun 06
Posts: 133
Credit: 3,847,433
RAC: 0
Message 67294 - Posted: 21 Aug 2010, 7:30:04 UTC - in response to Message 67291.  

I ran all the Prime95 tests for 10 hours each with no problems.

I will try each test for longer and see if I can track down the problem.


I would recommend to run only the one with the most RAM usage. CPU errors are usually detected faster and a 10 hours test for the CPU should do.
ID: 67294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile joseps

Send message
Joined: 25 Jun 06
Posts: 72
Credit: 8,173,820
RAC: 0
Message 71251 - Posted: 14 Sep 2011, 14:41:22 UTC

Has Rosetta@Home ever consider doing research for cure or prevention
of Type 2 Diabetes? It's a greater threat to the ageing population, just like Alzheimer's disease. I hope David Baker will consider this as part of DC
PROJECTS. More daily news make diabetes standout among diseases concern. The other day they called India the diabetes country in the world. Just curious and concerned. joseps

I turned off my 5computers when I went on vacation. When I return today, I can not upload work. Need work units to run computers.
joseps
ID: 71251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tomba

Send message
Joined: 29 May 06
Posts: 43
Credit: 1,558,972
RAC: 0
Message 73133 - Posted: 22 May 2012, 5:35:22 UTC

I stopped running Rosetta many months ago when I bought an i7. Problem was that he grabbed all eight CPUs and the fan noise was unacceptable. I did try Tthrottle but that affected all CPUs and my normal work was downgraded significantly.

Is there a way to assign n CPUs to Rosetta where n = Fan Noise OK?

Thanks, Tom
ID: 73133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,820,350
RAC: 58,523
Message 73137 - Posted: 22 May 2012, 8:18:40 UTC - in response to Message 73133.  

I stopped running Rosetta many months ago when I bought an i7. Problem was that he grabbed all eight CPUs and the fan noise was unacceptable. I did try Tthrottle but that affected all CPUs and my normal work was downgraded significantly.

Is there a way to assign n CPUs to Rosetta where n = Fan Noise OK?

Thanks, Tom

I don't know of a way to manage it that smartly against temperature. You can force BOINC to use 4 processors and see if that is acceptable. A bigger heatsink would of course be another option, but then that isn't free and isn't an option if it's a laptop.
ID: 73137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tomba

Send message
Joined: 29 May 06
Posts: 43
Credit: 1,558,972
RAC: 0
Message 73139 - Posted: 22 May 2012, 8:44:37 UTC - in response to Message 73137.  

You can force BOINC to use 4 processors and see if that is acceptable.


BINGO!! That's exactly what I wanted:



I set 25% of the processors === two === and Task Manager tells me three are in use; two for Rosetta and one for GPUGRID. And there's no delay in normal work.

Perfect solution! Many thanks, Tom
ID: 73139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73144 - Posted: 22 May 2012, 23:23:52 UTC
Last modified: 22 May 2012, 23:25:58 UTC

Yes, so the one for GPUGrid is a small fraction of one CPU. The bulk of the work would be running on the GPU, not the CPU.

You can probably run somewhere between 5 & 7 CPUs and still have modest fan speed. Basically we're just saying that it is an exercise left up to you to determine how many CPUs brings you to acceptable fan noise. Only takes a few minutes after adjustment to find out.
Rosetta Moderator: Mod.Sense
ID: 73144 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Beyond newbie Q&A



©2024 University of Washington
https://www.bakerlab.org