WUs fail with code 131

Questions and Answers : Unix/Linux : WUs fail with code 131

To post messages, you must log in.

AuthorMessage
Kenneth Larsen
Avatar

Send message
Joined: 17 Sep 05
Posts: 3
Credit: 112,217
RAC: 0
Message 410 - Posted: 24 Sep 2005, 9:28:58 UTC

Most of my work units have started failing the last few days with this code:


4.43
process exited with code 131 (0x83)


[0x864ff8f]
[0x86c183c]
[0xb7f5d420]
[0x8511831]
[0x85148b6]
[0x80d4914]
[0x823a774]
[0x8232651]
[0x8363677]
[0x86c6e84]
[0x8048121]
SIGSEGV: segmentation violationStack trace (11 frames):

Exiting...



I'm using Boinc v4.43 under Fedora Core 4 with an Athlon XP3000+ with 512MB ram (no overclocking).

Any idea what's wrong?
ID: 410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ralic

Send message
Joined: 22 Sep 05
Posts: 16
Credit: 46,481
RAC: 0
Message 445 - Posted: 25 Sep 2005, 8:49:05 UTC
Last modified: 25 Sep 2005, 8:52:18 UTC

I've also had one do this.
Looking at the logs, it was just at the point where BOINC had instructed the wu's to be removed from RAM prior to starting the 5day benchmark.

Relevant log extract below:

2005-09-24 09:17:43 [---] Suspending computation and network activity - running CPU benchmarks
2005-09-24 09:17:43 [rosetta@home] Pausing result 1pvaA_abrelax_18978_0 (removed from memory)
2005-09-24 09:17:43 [rosetta@home] Pausing result 1pvaA_abrelax_18992_0 (removed from memory)
2005-09-24 09:17:44 [rosetta@home] Unrecoverable error for result 1pvaA_abrelax_18992_0 (process exited with code 131 (0x83))
2005-09-24 09:17:44 [---] request_reschedule_cpus: process exited
2005-09-24 09:17:45 [---] Running CPU benchmarks
2005-09-24 09:18:13 [---] Aborting CPU benchmarks, one or more active tasks are still running.
2005-09-24 09:18:13 [---] Resuming computation and network activity
2005-09-24 09:18:13 [---] request_reschedule_cpus: Resuming activities
2005-09-24 09:18:13 [rosetta@home] Deferring communication with project for 31 seconds
2005-09-24 09:18:13 [rosetta@home] Computation for result 1pvaA_abrelax_18992_0 finished
2005-09-24 09:18:13 [rosetta@home] Starting result 1pvaA_abrelax_18993_0 using rosetta version 4.77
2005-09-24 09:18:14 [---] ACTIVE_TASK_SET::check_app_exited(): pid 9550 not found
2005-09-24 09:18:15 [---] ACTIVE_TASK_SET::check_app_exited(): pid 9551 not found



[edit]
Here's a link to the result id: 28059
[/edit]
ID: 445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Juerschi

Send message
Joined: 17 Sep 05
Posts: 8
Credit: 14,145
RAC: 0
Message 461 - Posted: 25 Sep 2005, 13:53:09 UTC

My linux host had quite a smiliar problem like ralics host. WU was removed from memory but wasn't errored out. Benchmarking started but was aborted because of one or ore active tasks. Error message ACTIVE_TASK_SET was quite the same like ralic posted, only pid number is different
ID: 461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Desti

Send message
Joined: 16 Sep 05
Posts: 50
Credit: 3,018
RAC: 0
Message 500 - Posted: 25 Sep 2005, 23:35:18 UTC - in response to Message 445.  

ID: 500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ralic

Send message
Joined: 22 Sep 05
Posts: 16
Credit: 46,481
RAC: 0
Message 529 - Posted: 26 Sep 2005, 6:50:33 UTC - in response to Message 445.  
Last modified: 26 Sep 2005, 6:51:42 UTC

I've also had one do this.

Well, I've had another one do it, and benchmarks were not in the picture this time.

Relevant log extract below:

26/09/2005 04:37:30|rosetta@home|Starting result 1pvaA_abrelax_16851_1 using rosetta version 4.77
26/09/2005 04:38:02|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_16851_1 (process exited with code 131 (0x83))
26/09/2005 04:38:02||request_reschedule_cpus: process exited
26/09/2005 04:38:02|rosetta@home|Deferring communication with project for 1 minutes and 0 seconds
26/09/2005 04:38:02|rosetta@home|Computation for result 1pvaA_abrelax_16851_1 finished


result id: 28244
The error message in the result is slightly different this time, since it includes the following line:
No heartbeat from core client for 31 sec - exiting

ID: 529 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
daniele

Send message
Joined: 12 Oct 06
Posts: 18
Credit: 20,328
RAC: 0
Message 30176 - Posted: 28 Oct 2006, 11:53:39 UTC

This night I had the same error from one WU, but nothing remarkable in stderr.txt. I have other 2 WUs with nearly the same name, in few hours I'll see if they get the same error.
ID: 30176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Unix/Linux : WUs fail with code 131



©2024 University of Washington
https://www.bakerlab.org