Message boards : Number crunching : Anyone else getting computation errors?
Author | Message |
---|---|
darkstar Send message Joined: 19 Oct 06 Posts: 1 Credit: 2,359 RAC: 0 |
Is this normal? Since Dec 6th I've had 11 computation errors and only 8 successes! In my Tasks page it says client error for all of them. The last time it happened (an hour ago) I explicitly suspended all my tasks in BOINC Manager by going to Activities | Suspend, since I needed all my CPU power for something. And Rosetta immediately gave me a computation error! Grr! I'm using Ubuntu 7.04 and the latest version of BOINC (5.10.21). If it's normal it's pretty lame: I hate wasting all that time for something that's just going to fail. If it's not normal that's equally lame! I'm becoming convinced there's a problem with suspending/resuming Rosetta@Home projects. I'm thinking I should find a different medical project to balance seti@home. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
No, Everything is going smoothly. I often suspend and resume and there doesn't seem to be a problem. The only errors I have had recently were self induced. |
eric Send message Joined: 2 Jan 07 Posts: 23 Credit: 815,696 RAC: 0 |
I also have been getting a lot of compute errors on my Ubuntu box. Here is a link to the results from it. https://boinc.bakerlab.org/rosetta/results.php?hostid=687195 I have it set not to get more work from Rosetta until the problem gets fixed. Hopefully, it will be soon. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I also have been getting a lot of compute errors on my Ubuntu box. Here is a link to the results from it. eric, I'm getting them on Mandriva too. Well, last nite they also began getting the -193 sigsegv errors. I'm running 64b boinc, are you running 32b or 64b boinc? |
eric Send message Joined: 2 Jan 07 Posts: 23 Credit: 815,696 RAC: 0 |
I am running 32bit. |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,222,545 RAC: 7,910 |
I am running 32bit. I am running XP and well over 1/2 of my recent work units have ended in compute errors. The problems started about mid-day Friday and the problem appears to be about only about 1/2 the time. You can review my results: https://boinc.bakerlab.org/rosetta/results.php?hostid=43057&offset=0 Let me know if I am doing something wrong. Thx! Paul |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I am running 32bit. Hi Paul, You have many computers attached, it's only the quad hostid=43057 that has the computation errors. The first listed error shows: Unhandled Exception Detected... the rest of them on the first page show: - Unhandled Exception Record - Since it's just the one machine and not the others, I'd check your case for dirt accumulation, and cpu/ram/Northbridge temps. Failing that, If you Overclock you might turn it down a snidge. If that's not it, then run Memtest86+ to check your memory. Or, you can wait and see if this becomes an issue frequent to the users of the new app 5.90, which was released just prior to the time you state this started happening. It might be more project wide if the problem is within 5.90, but I'd think that it would affect more than just your one computer. Hope this helps |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,222,545 RAC: 7,910 |
I am running 32bit. Thx for the help. I am pushing this system really hard so I expect a few problems. It looks like 3 successful WUs in a row so maybe the system needed some time to stabilize. We will wait to see if it is a project issue or if it is just me. As you suggested, I could always slow things down a little. keep crunching R@H! Thx! Paul |
Leonzio Send message Joined: 19 Nov 07 Posts: 8 Credit: 2,731 RAC: 0 |
Yes, I did. 1mz9__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1mz9_-crystal_foldanddock__2468_1693 1qx8__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1qx8_-crystal_foldanddock__2468_1693 At the end I had to abort theese WUs. :( |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Yes, I did. Hi, Leonzio, There is a Known Problem with 5.90 and Linux. You should probably abort any that don't start running normally, or just abort all the 5.90 work you have. They have implemented a fix and released 5.91 for us linux users. tony |
Leonzio Send message Joined: 19 Nov 07 Posts: 8 Credit: 2,731 RAC: 0 |
1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783 The WUs like this work very well. Are a version 5.90 too? |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783 I don't know. I went through all 14 result ID's showing for your host, and don't see that wu, so I can't check it. I'm not entirely sure I comprehend what you are asking. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I think he is saying that he was able to complete tasks with that name. And now he's asking if v5.90 has tasks with that name as well. Leonzio, there will be a lot of task names all the time. The new v5.90 should use less virtual memory then the prior versions. So, you should be OK with any task name. [edit] Now I see Leonzio's machine is Linux... so you will see v5.91. And yes, the v5.90 had a problem on Linux. So it might be best if you abort any v5.90 tasks that you have. Rosetta Moderator: Mod.Sense |
Leonzio Send message Joined: 19 Nov 07 Posts: 8 Credit: 2,731 RAC: 0 |
1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783 The link at this WU: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=116854959 Well, There isn't the version. I think he is saying that he was able to complete tasks with that name. And now he's asking if v5.90 has tasks with that name as well. I had some problems in first days of December. So, I changed my client: I compiled from source code downloaded from Debian wiki. After, I had problems only with the first two WU wrote in this forum. Sorry for my English. I studied it many and many years ago, and it isn't my language. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Ahh, I see..... We won't be able to see the version until the wu is finished and returned. Only YOU can see what version Boinc will use to run that WU. If you're using a gui boinc manager, then it should show up under "application" in the "tasks" tab. If it says 5.90 then I'd just abort it and any others listed as 5.90, then they'll issue you 5.91 wus. |
Leonzio Send message Joined: 19 Nov 07 Posts: 8 Credit: 2,731 RAC: 0 |
Thanks, I see that they are 5.89. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
if they give you problems i think it is the best to abort them, and wait for some 5.91's |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Thanks, I see that they are 5.89. Leonzio, According to the records. It shows you're reported 14 wus, 10 of them were done with 5.89 without error, two were done with 5.90 and both had errors, and two were 5.91 and one you aborted and the other you returned successfully. So, if it's a 5.89, it'll probably run just fine. here's more info on your returned work: |
Leonzio Send message Joined: 19 Nov 07 Posts: 8 Credit: 2,731 RAC: 0 |
Thanks, I see that they are 5.89. It's very interesting that a WU well done is a "5.90". :-) What is the link from which [OT: "from which" is it correct?] is token that image? |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Thanks, I see that they are 5.89. Leonzio, I have software to extract and calculate that data. I input a host ID number and run it. It fills in the rest. PM me if you have MS excel and give your email addy, I'll send you a copy of the software (note: it has trouble with Office 2007). tony |
Message boards :
Number crunching :
Anyone else getting computation errors?
©2024 University of Washington
https://www.bakerlab.org