Anyone else getting computation errors?

Message boards : Number crunching : Anyone else getting computation errors?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
darkstar

Send message
Joined: 19 Oct 06
Posts: 1
Credit: 2,359
RAC: 0
Message 49656 - Posted: 12 Dec 2007, 20:03:53 UTC

Is this normal? Since Dec 6th I've had 11 computation errors and only 8 successes!
In my Tasks page it says client error for all of them.
The last time it happened (an hour ago) I explicitly suspended all my tasks in BOINC Manager by going to Activities | Suspend, since I needed all my CPU power for something. And Rosetta immediately gave me a computation error!
Grr!
I'm using Ubuntu 7.04 and the latest version of BOINC (5.10.21).

If it's normal it's pretty lame: I hate wasting all that time for something that's just going to fail.
If it's not normal that's equally lame!

I'm becoming convinced there's a problem with suspending/resuming Rosetta@Home projects.

I'm thinking I should find a different medical project to balance seti@home.
ID: 49656 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 49657 - Posted: 12 Dec 2007, 21:06:38 UTC

No, Everything is going smoothly. I often suspend and resume and there doesn't seem to be a problem. The only errors I have had recently were self induced.
ID: 49657 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
eric

Send message
Joined: 2 Jan 07
Posts: 23
Credit: 815,696
RAC: 0
Message 49773 - Posted: 18 Dec 2007, 12:53:52 UTC

I also have been getting a lot of compute errors on my Ubuntu box. Here is a link to the results from it.

https://boinc.bakerlab.org/rosetta/results.php?hostid=687195

I have it set not to get more work from Rosetta until the problem gets fixed. Hopefully, it will be soon.
ID: 49773 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49775 - Posted: 18 Dec 2007, 13:24:50 UTC - in response to Message 49773.  
Last modified: 18 Dec 2007, 13:25:11 UTC

I also have been getting a lot of compute errors on my Ubuntu box. Here is a link to the results from it.

https://boinc.bakerlab.org/rosetta/results.php?hostid=687195

I have it set not to get more work from Rosetta until the problem gets fixed. Hopefully, it will be soon.

eric, I'm getting them on Mandriva too. Well, last nite they also began getting the -193 sigsegv errors. I'm running 64b boinc, are you running 32b or 64b boinc?
ID: 49775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
eric

Send message
Joined: 2 Jan 07
Posts: 23
Credit: 815,696
RAC: 0
Message 49790 - Posted: 19 Dec 2007, 3:20:19 UTC

I am running 32bit.
ID: 49790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,467,877
RAC: 9,658
Message 49940 - Posted: 22 Dec 2007, 13:26:18 UTC - in response to Message 49790.  

I am running 32bit.


I am running XP and well over 1/2 of my recent work units have ended in compute errors. The problems started about mid-day Friday and the problem appears to be about only about 1/2 the time.

You can review my results:
https://boinc.bakerlab.org/rosetta/results.php?hostid=43057&offset=0

Let me know if I am doing something wrong.
Thx!

Paul

ID: 49940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49941 - Posted: 22 Dec 2007, 13:41:39 UTC - in response to Message 49940.  
Last modified: 22 Dec 2007, 13:43:56 UTC

I am running 32bit.


I am running XP and well over 1/2 of my recent work units have ended in compute errors. The problems started about mid-day Friday and the problem appears to be about only about 1/2 the time.

You can review my results:
https://boinc.bakerlab.org/rosetta/results.php?hostid=43057&offset=0

Let me know if I am doing something wrong.

Hi Paul, You have many computers attached, it's only the quad hostid=43057 that has the computation errors. The first listed error shows:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C901230


the rest of them on the first page show:

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004026E2 read attempt to address 0xFFFFFFFF


Since it's just the one machine and not the others, I'd check your case for dirt accumulation, and cpu/ram/Northbridge temps. Failing that, If you Overclock you might turn it down a snidge. If that's not it, then run Memtest86+ to check your memory.

Or, you can wait and see if this becomes an issue frequent to the users of the new app 5.90, which was released just prior to the time you state this started happening. It might be more project wide if the problem is within 5.90, but I'd think that it would affect more than just your one computer.

Hope this helps
ID: 49941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,467,877
RAC: 9,658
Message 49942 - Posted: 22 Dec 2007, 14:16:27 UTC - in response to Message 49941.  

I am running 32bit.


I am running XP and well over 1/2 of my recent work units have ended in compute errors. The problems started about mid-day Friday and the problem appears to be about only about 1/2 the time.

You can review my results:
https://boinc.bakerlab.org/rosetta/results.php?hostid=43057&offset=0

Let me know if I am doing something wrong.

Hi Paul, You have many computers attached, it's only the quad hostid=43057 that has the computation errors. The first listed error shows:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C901230


the rest of them on the first page show:

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004026E2 read attempt to address 0xFFFFFFFF


Since it's just the one machine and not the others, I'd check your case for dirt accumulation, and cpu/ram/Northbridge temps. Failing that, If you Overclock you might turn it down a snidge. If that's not it, then run Memtest86+ to check your memory.

Or, you can wait and see if this becomes an issue frequent to the users of the new app 5.90, which was released just prior to the time you state this started happening. It might be more project wide if the problem is within 5.90, but I'd think that it would affect more than just your one computer.

Hope this helps


Thx for the help. I am pushing this system really hard so I expect a few problems. It looks like 3 successful WUs in a row so maybe the system needed some time to stabilize. We will wait to see if it is a project issue or if it is just me. As you suggested, I could always slow things down a little.

keep crunching R@H!
Thx!

Paul

ID: 49942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Leonzio
Avatar

Send message
Joined: 19 Nov 07
Posts: 8
Credit: 2,731
RAC: 0
Message 49946 - Posted: 22 Dec 2007, 18:37:00 UTC - in response to Message 49942.  

Yes, I did.

1mz9__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1mz9_-crystal_foldanddock__2468_1693

1qx8__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1qx8_-crystal_foldanddock__2468_1693

At the end I had to abort theese WUs. :(
ID: 49946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49947 - Posted: 22 Dec 2007, 18:55:44 UTC - in response to Message 49946.  

Yes, I did.

1mz9__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1mz9_-crystal_foldanddock__2468_1693

1qx8__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1qx8_-crystal_foldanddock__2468_1693

At the end I had to abort theese WUs. :(


Hi, Leonzio, There is a Known Problem with 5.90 and Linux. You should probably abort any that don't start running normally, or just abort all the 5.90 work you have. They have implemented a fix and released 5.91 for us linux users.

tony
ID: 49947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Leonzio
Avatar

Send message
Joined: 19 Nov 07
Posts: 8
Credit: 2,731
RAC: 0
Message 49951 - Posted: 22 Dec 2007, 21:42:58 UTC

1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783
The WUs like this work very well.
Are a version 5.90 too?
ID: 49951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49953 - Posted: 22 Dec 2007, 22:08:35 UTC - in response to Message 49951.  

1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783
The WUs like this work very well.
Are a version 5.90 too?

I don't know. I went through all 14 result ID's showing for your host, and don't see that wu, so I can't check it.

I'm not entirely sure I comprehend what you are asking.
ID: 49953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49956 - Posted: 22 Dec 2007, 22:33:39 UTC
Last modified: 22 Dec 2007, 22:36:40 UTC

I think he is saying that he was able to complete tasks with that name. And now he's asking if v5.90 has tasks with that name as well.

Leonzio, there will be a lot of task names all the time. The new v5.90 should use less virtual memory then the prior versions. So, you should be OK with any task name.

[edit]
Now I see Leonzio's machine is Linux... so you will see v5.91. And yes, the v5.90 had a problem on Linux. So it might be best if you abort any v5.90 tasks that you have.
Rosetta Moderator: Mod.Sense
ID: 49956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Leonzio
Avatar

Send message
Joined: 19 Nov 07
Posts: 8
Credit: 2,731
RAC: 0
Message 49957 - Posted: 22 Dec 2007, 22:38:56 UTC - in response to Message 49953.  
Last modified: 22 Dec 2007, 22:53:01 UTC

1tif__BOINC_ABINITIO_VF-S25-9-S3-3--1tif_-vf__2450_9783
The WUs like this work very well.
Are a version 5.90 too?

I don't know. I went through all 14 result ID's showing for your host, and don't see that wu, so I can't check it.

I'm not entirely sure I comprehend what you are asking.

The link at this WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=116854959
Well, There isn't the version.

I think he is saying that he was able to complete tasks with that name. And now he's asking if v5.90 has tasks with that name as well.

Leonzio, there will be a lot of task names all the time. The new v5.90 should use less virtual memory then the prior versions. So, you should be OK with any task name.

[edit]
Now I see Leonzio's machine is Linux... so you will see v5.91. And yes, the v5.90 had a problem on Linux. So it might be best if you abort any v5.90 tasks that you have.

I had some problems in first days of December. So, I changed my client: I compiled from source code downloaded from Debian wiki.
After, I had problems only with the first two WU wrote in this forum.
Sorry for my English. I studied it many and many years ago, and it isn't my language.
ID: 49957 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49958 - Posted: 22 Dec 2007, 22:52:17 UTC - in response to Message 49957.  
Last modified: 22 Dec 2007, 22:52:35 UTC


The link at this WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=116854959
Well, There isn't the version.

Ahh, I see..... We won't be able to see the version until the wu is finished and returned. Only YOU can see what version Boinc will use to run that WU. If you're using a gui boinc manager, then it should show up under "application" in the "tasks" tab. If it says 5.90 then I'd just abort it and any others listed as 5.90, then they'll issue you 5.91 wus.
ID: 49958 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Leonzio
Avatar

Send message
Joined: 19 Nov 07
Posts: 8
Credit: 2,731
RAC: 0
Message 49959 - Posted: 22 Dec 2007, 22:56:25 UTC - in response to Message 49958.  


The link at this WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=116854959
Well, There isn't the version.

Ahh, I see..... We won't be able to see the version until the wu is finished and returned. Only YOU can see what version Boinc will use to run that WU. If you're using a gui boinc manager, then it should show up under "application" in the "tasks" tab. If it says 5.90 then I'd just abort it and any others listed as 5.90, then they'll issue you 5.91 wus.

Thanks, I see that they are 5.89.
ID: 49959 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49961 - Posted: 22 Dec 2007, 23:17:40 UTC - in response to Message 49959.  


The link at this WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=116854959
Well, There isn't the version.

Ahh, I see..... We won't be able to see the version until the wu is finished and returned. Only YOU can see what version Boinc will use to run that WU. If you're using a gui boinc manager, then it should show up under "application" in the "tasks" tab. If it says 5.90 then I'd just abort it and any others listed as 5.90, then they'll issue you 5.91 wus.

Thanks, I see that they are 5.89.


if they give you problems i think it is the best to abort them, and wait for some 5.91's
ID: 49961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49964 - Posted: 23 Dec 2007, 1:29:03 UTC - in response to Message 49959.  
Last modified: 23 Dec 2007, 2:02:21 UTC

Thanks, I see that they are 5.89.

Leonzio, According to the records. It shows you're reported 14 wus, 10 of them were done with 5.89 without error, two were done with 5.90 and both had errors, and two were 5.91 and one you aborted and the other you returned successfully. So, if it's a 5.89, it'll probably run just fine.

here's more info on your returned work:
ID: 49964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Leonzio
Avatar

Send message
Joined: 19 Nov 07
Posts: 8
Credit: 2,731
RAC: 0
Message 49971 - Posted: 23 Dec 2007, 15:13:51 UTC - in response to Message 49964.  

Thanks, I see that they are 5.89.

Leonzio, According to the records. It shows you're reported 14 wus, 10 of them were done with 5.89 without error, two were done with 5.90 and both had errors, and two were 5.91 and one you aborted and the other you returned successfully. So, if it's a 5.89, it'll probably run just fine.

here's more info on your returned work:

It's very interesting that a WU well done is a "5.90". :-)
What is the link from which [OT: "from which" is it correct?] is token that image?
ID: 49971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49974 - Posted: 23 Dec 2007, 16:32:49 UTC - in response to Message 49971.  

Thanks, I see that they are 5.89.

Leonzio, According to the records. It shows you're reported 14 wus, 10 of them were done with 5.89 without error, two were done with 5.90 and both had errors, and two were 5.91 and one you aborted and the other you returned successfully. So, if it's a 5.89, it'll probably run just fine.

here's more info on your returned work:

It's very interesting that a WU well done is a "5.90". :-)
What is the link from which [OT: "from which" is it correct?] is token that image?

Leonzio, I have software to extract and calculate that data. I input a host ID number and run it. It fills in the rest.

PM me if you have MS excel and give your email addy, I'll send you a copy of the software (note: it has trouble with Office 2007).

tony
ID: 49974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Anyone else getting computation errors?



©2024 University of Washington
https://www.bakerlab.org