Report Problems with Rosetta Version 5.12

Message boards : Number crunching : Report Problems with Rosetta Version 5.12

To post messages, you must log in.

AuthorMessage
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 15789 - Posted: 10 May 2006, 15:48:11 UTC

Looks like I'm the first to report an error on v5.12...

Computer ID = 182506

5/10/2006 10:31:38 AM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_1291_0 ( - exit code -1073741811 (0xc000000d))

5/10/2006 10:31:43 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_1333_0 ( - exit code -1073741811 (0xc000000d))


Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 15789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stu D.
Avatar

Send message
Joined: 3 Mar 06
Posts: 8
Credit: 575,867
RAC: 0
Message 15796 - Posted: 10 May 2006, 16:21:56 UTC
Last modified: 10 May 2006, 16:23:27 UTC

5/10/2006 12:16:23 PM|rosetta@home|Unrecoverable error for result JUMP_NATIVEBARCODE_ANTIPARALLEL_1tul__SAVE_ALL_OUT_492_877_0 ( - exit code -1073741811 (0xc000000d))
5/10/2006 12:16:19 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_4396_0 ( - exit code -1073741811 (0xc000000d))

Computer...175527
Stu
ID: 15796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stu D.
Avatar

Send message
Joined: 3 Mar 06
Posts: 8
Credit: 575,867
RAC: 0
Message 15834 - Posted: 10 May 2006, 20:36:21 UTC
Last modified: 10 May 2006, 20:40:13 UTC

5/10/2006 4:32:33 PM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_9050_0 ( - exit code -1073741811 (0xc000000d))
5/10/2006 4:32:38 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_7224_0 ( - exit code -1073741811 (0xc000000d))

Computer...175527 AMD x2 3800
No problem like this with prior versions.
Stu
ID: 15834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
andreas

Send message
Joined: 22 Nov 05
Posts: 2
Credit: 10,465
RAC: 0
Message 15845 - Posted: 10 May 2006, 21:53:41 UTC

trying to resume Rosetta@home after the machine was offline for over a month, I
get only errors

2006-05-10 13:13:11 [rosetta@home] execv(../../projects/boinc.bakerlab.org_rosetta/rosetta_5.12_i686-pc-linux-gnu) failed: error -1
2006-05-10 13:13:11 [rosetta@home] Starting result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 using rosetta version 512
2006-05-10 13:13:12 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 (process exited with code 26 (0x1a))
2006-05-10 13:13:12 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 (process exited with code 26 (0x1a))
2006-05-10 13:13:12 [---] request_reschedule_cpus: process exited
2006-05-10 13:13:12 [rosetta@home] Computation for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 finished
2006-05-10 13:13:12 [rosetta@home] Starting result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 using rosetta version 512
2006-05-10 13:13:12 [rosetta@home] execv(../../projects/boinc.bakerlab.org_rosetta/rosetta_5.12_i686-pc-linux-gnu) failed: error -1
2006-05-10 13:13:13 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 (process exited with code 26 (0x1a))
2006-05-10 13:13:13 [rosetta@home] Unrecoverable error for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 (process exited with code 26 (0x1a))
2006-05-10 13:13:13 [---] request_reschedule_cpus: process exited
2006-05-10 13:13:13 [rosetta@home] Computation for result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec320_1.pdb_501_7_0 finished

For now, I have suspended this project.
ID: 15845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kerwin

Send message
Joined: 19 Sep 05
Posts: 10
Credit: 1,773,393
RAC: 0
Message 15854 - Posted: 11 May 2006, 0:20:42 UTC
Last modified: 11 May 2006, 0:51:29 UTC

I was looking at the graphics for 5.12 and suddenly they stopped, the computer froze, my screen went blank and all of a sudden it reappeared with a message from my Catalyst driver stating that the video card was no longer accepting commands from the driver and that it had to be reset. Once this error message popped up, everything went back to normal, graphics and all.

I'm using an ATI X850 XT. This is the second time this happened. It also happened with a result using 5.07 with the same outcome.

Edit:
I continued watching the graphics and it happened again, but this time 4 in a row a few seconds apart. By the fourth time, the video card stopped responding and the driver reverted to software rendering. At this point I chose to restart my machine. This happened 4.70% into JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_4697_0

ID: 15854 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
andreas

Send message
Joined: 22 Nov 05
Posts: 2
Credit: 10,465
RAC: 0
Message 15952 - Posted: 11 May 2006, 19:33:26 UTC - in response to Message 15849.  

trying to resume Rosetta@home after the machine was offline for over a month, I
get only errors

2006-05-10 13:13:11 [rosetta@home] execv(../../projects/boinc.bakerlab.org_rosetta/rosetta_5.12_i686-pc-linux-gnu) failed: error -1
2006-05-10 13:13:11 [rosetta@home] Starting result HOMO_7486_h008_1_LOOPRLX_IGNORE_THE_REST7486h008_dec452_1.pdb_501_7_0 using rosetta version 512
...For now, I have suspended this project.


This is a file error related to an open file. Since you have been offline for a while try resetting the project.


I reset this project and it upgraded itself to version 5.13 which seems to be
running fine so far.
ID: 15952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 15954 - Posted: 11 May 2006, 19:37:52 UTC

JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_23413_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=19786764

Got to 100% and then just sat there, and sat there, and ...

Got an interesting error though: *** glibc detected *** corrupted double-linked list: 0x0b4af0f8 ***

dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 15954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 16116 - Posted: 13 May 2006, 0:25:04 UTC
Last modified: 13 May 2006, 0:29:21 UTC

this compter
16420818
16420797
16420710

BOINC ver 5.4.9
Win XP home sp2 3.00 GHz 1GB RAM



12/05/2006 11:28:04 AM||Suspending network activity - user request - running CPU benchmarks
12/05/2006 11:28:06 AM||Running CPU benchmarks
12/05/2006 11:29:05 AM||Benchmark results:
12/05/2006 11:29:05 AM|| Number of CPUs: 2
12/05/2006 11:29:05 AM|| 1311 floating point MIPS (Whetstone) per CPU
12/05/2006 11:29:05 AM|| 1209 integer MIPS (Dhrystone) per CPU
12/05/2006 11:29:05 AM||Finished CPU benchmarks
12/05/2006 11:29:06 AM|rosetta@home|Resuming task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 using rosetta version 512
12/05/2006 11:29:06 AM|rosetta@home|Resuming task JUMP_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16802_0 using rosetta version 512
12/05/2006 11:29:06 AM||Resuming computation
12/05/2006 11:29:06 AM||Rescheduling CPU: Resuming computation
12/05/2006 12:58:06 PM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0: exceeded disk limit: 101397335.000000 > 100000000.000000
12/05/2006 12:58:06 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 (Maximum disk usage exceeded)

12/05/2006 12:58:06 PM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds
12/05/2006 12:58:08 PM||Rescheduling CPU: application exited
12/05/2006 12:58:08 PM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16829_0 finished
12/05/2006 12:58:08 PM|rosetta@home|Starting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 using rosetta version 512
12/05/2006 1:38:25 PM||Rescheduling CPU: application exited
12/05/2006 1:38:25 PM|rosetta@home|Computation for task JUMP_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16802_0 finished
12/05/2006 1:38:25 PM|rosetta@home|Starting task JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_16837_0 using rosetta version 512
12/05/2006 9:28:15 PM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0: exceeded disk limit: 100258025.000000 > 100000000.000000
12/05/2006 9:28:15 PM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 (Maximum disk usage exceeded)

12/05/2006 9:28:15 PM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds
12/05/2006 9:28:17 PM||Rescheduling CPU: application exited
12/05/2006 9:28:17 PM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16836_0 finished
12/05/2006 9:28:17 PM|rosetta@home|Starting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 using rosetta version 512
13/05/2006 7:38:36 AM|rosetta@home|Aborting task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0: exceeded disk limit: 102689604.000000 > 100000000.000000
13/05/2006 7:38:36 AM|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 (Maximum disk usage exceeded)

13/05/2006 7:38:36 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds
13/05/2006 7:38:38 AM||Rescheduling CPU: application exited
13/05/2006 7:38:38 AM|rosetta@home|Computation for task JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16800_0 finished
13/05/2006 7:38:38 AM|rosetta@home|Starting task JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_16815_0 using rosetta version 512


Edit to add highlighting
ID: 16116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darren
Avatar

Send message
Joined: 6 Oct 05
Posts: 27
Credit: 43,535
RAC: 0
Message 16141 - Posted: 13 May 2006, 5:24:59 UTC

I've also got an error with 5.12 for "exceeded disk limit". This is on a Gentoo Linux system, so the windows debug code shouldn't be the problem.

Sat May 13 00:41:08 2006|rosetta@home|Aborting result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0: exceeded disk limit: 103092700.000000 > 100000000.000000
Sat May 13 00:41:08 2006|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0 (Maximum disk usage exceeded)
Sat May 13 00:41:09 2006||request_reschedule_cpus: process exited
Sat May 13 00:41:09 2006|rosetta@home|Computation for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_493_16783_0 finished


It's this work unit, which is my only 5.12 work unit - the next is 5.13.



ID: 16141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16229 - Posted: 14 May 2006, 5:59:06 UTC

I still have a Version 5.12 work unit suspended in my queue until I finish crunching workunits for LHC. Should I abort this workunit?
ID: 16229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Delk

Send message
Joined: 20 Feb 06
Posts: 25
Credit: 995,624
RAC: 0
Message 16348 - Posted: 16 May 2006, 0:02:15 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=19754192

checkpoint CPU time: 28235.070000
current CPU time: 28241.740000
fraction done: 1.000000


https://boinc.bakerlab.org/rosetta/result.php?resultid=19771884

checkpoint CPU time: 28621.150000
current CPU time: 28626.950000
fraction done: 1.000000

Both of these are JUMP_ALLBARCODES_* workunits and are the first real problems I've noticed since the recent rosetta updates included the watchdog. These differed from past errors in 2 ways, firstly fraction done is exactly 1.00 although checkpoints have been written & both of these were completely frozen (2 different linux servers), now when I say frozen I mean not even the cpu time was increasing unlike the old stuck wu's of past rosetta versions. Anyway the result urls both show glibc errors which is a first out of both of these servers to the best of my knowledge and since both the wu's are JUMP_ALLBARCODES_* I figure this is not a coincidence.

I manually aborted both in the end, since I dont cache wu's time sent shows both of these wu's have been stuck for 5 days each.

ID: 16348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Report Problems with Rosetta Version 5.12



©2025 University of Washington
https://www.bakerlab.org