Message boards : Number crunching : Problems with Rosetta version 5.80
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
This result says it is "invalid", even though the stderr.txt, the exit status, and the message log look perfectly normal. https://boinc.bakerlab.org/rosetta/result.php?resultid=112607927 |
MM Sihombing Send message Joined: 22 May 06 Posts: 15 Credit: 1,424,082 RAC: 0 |
1c26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_5799_0 Compute error Exit status -1073741819 (0xc0000005) |
Xaak Send message Joined: 20 Mar 06 Posts: 17 Credit: 3,701,702 RAC: 0 |
The rediculously high credit wus are still happening Latest example: https://boinc.bakerlab.org/rosetta/result.php?resultid=113518482 Result ID 113518482 Name 1r69__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1r69_-frags83__2179_18461_1 Workunit 103081085 Created 18 Oct 2007 8:10:52 UTC Sent 18 Oct 2007 8:11:00 UTC Received 18 Oct 2007 17:48:39 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 567404 Report deadline 28 Oct 2007 8:11:00 UTC CPU time 7703.15625 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 3319170 ====================================================== DONE :: 1 starting structures 7702.38 cpu seconds This process generated 165 decoys from 165 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> Validate state Valid Claimed credit 34.169257800865 Granted credit 1238.8941257865 application version 5.80 XaaK |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
The rediculously high credit wus are still happening that 1 made just an awfull lot of decoys, if you devide the credit given by the decoys, that comes to somewere between 7 and 8 per decoy. claimed is normal, cause thats calculated on hand of the cpu seconds spend. but this 1 made whay much decoys, i normaly make 10 to 20 decoys in 3 hours on my single core thingie. |
mikus Send message Joined: 7 Nov 05 Posts: 58 Credit: 700,115 RAC: 0 |
aborted beta - https://boinc.bakerlab.org/rosetta/result.php?resultid=112298830 Went to my computer (to make a connection), and saw (gkrellm) that one of the cores was idle. Boincmgr status showed two Rosetta WUs running. Top showed one of them using CPU, the other sitting there "stuck". Manually aborted the second. I have plenty of memory; "leave work in memory" is specified. Judging by the CPU time acumulated by the "stuck" workunit, it had completed its quota of decoys, and was in the process of shutting down when it got "stuck". Dual core Linux 32-bit system, boinc 5.10.21. Rosetta tasks usually complete just fine. The problem that "stuck" workunits cause is that boinc keeps track of the number of seconds given to tasks. As near as I can tell, my system spent so much wall clock time __not__ executing that "stuck" WU that its boinc calculated efficiency has now been severely reduced. I run off-line, and connect only occasionally. The lowered efficiency value means that for a while I will be given *less* work each time I connect, and will therefore have to connect more often. Not good. . |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mikus, please join the discussion on Linux preemption issues in this thread. Rosetta Moderator: Mod.Sense |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=113112849 ???? |
Dr Who Fan Send message Joined: 28 May 06 Posts: 79 Credit: 273,880 RAC: 243 |
Another crunching error https://boinc.bakerlab.org/rosetta/result.php?resultid=113450720 stderr out CPU type AuthenticAMD AMD Athlon(tm) 64 X2 Dual-Core Processor TK-53 [x86 Family 15 Model 104 Stepping 1] CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_1052746_0 <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 2226295 == </stderr_txt> ]]> Validate state Invalid Claimed credit 20.426126431417 Granted credit 0 application version 5.69 |
mikus Send message Joined: 7 Nov 05 Posts: 58 Credit: 700,115 RAC: 0 |
Mikus, please join the discussion on Linux preemption issues in this thread. I may do so -- but the reason I did not originally is that I believe that all of the recommendations in that thread were already in place on my system. As far as I can tell, the Rosetta workunit got "stuck" __after__ it had completed crunching. So to my mind there was no "task preemption" involved (only "task exit"). Also, if it were a preemption issue, I would expect other Rosetta tasks on my system to be failing in a similar fashion. But only that *beta* 5.80 has failed so far. I suspect the problem was triggered by something about that particular task. [Note: My 'Rosetta time to crunch' is 8 hours, meaning I run Rosetta applications (including 5.80) for a longer time than typical participants do. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Work unit 104545488 stuck after 3 seconds: aborting. Mac OS X 10.4.19, Intel-based Imac, Boinc 5.10.20 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I just got these two units, now they both failed on the same user it seems. Is there a problem with the work units, the app? or their puter's. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=104775454 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=104776777 This is from their results, same for both. <core_client_version>5.10.20</core_client_version> <![CDATA[ <message> No main program specified </message> ]]> Pete. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
I suspect the computer. I have eight of his and am working on the third without problems. |
Markus Schuhmacher Send message Joined: 29 May 06 Posts: 4 Credit: 1,455,542 RAC: 0 |
Since I installed BOINc 5.10.23 the service didn't crash anymore. ## Seitdem ich den Boinc-Client auf 5.10.23 aktualisiert habe, läuft Boinc stabiel. |
Keith T. Send message Joined: 1 Mar 07 Posts: 58 Credit: 34,135 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=115759805 cryb__BOINC_ABRELAX_SAVE_ALL_OUT-cryb_-_2227_32333_0 Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) CPU time 1314.3125 stderr out <core_client_version>5.10.7</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 2304098 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C910E23 read attempt to address 0x00150586 Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010 Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]> Validate state Invalid I have seen this error before but it is rare. Last one was 16 Sep 2007 according to BoincView logs. Keith T. |
ziegenmelker Send message Joined: 26 Jul 06 Posts: 10 Credit: 26,061 RAC: 0 |
Some more: 5.80: SIGSEGV and '*** glibc detected *** corrupted double-linked list: 0x0a01aa28 ***', but valid and granted credits (32 for 4h ???) 5.80: process got signal 11 and 2 SIGSEGV: Invalid 5.69: process exited with code 193 (0xc1) and 3 SIGSEGV: Invalid 5.80: process exited with code 193 (0xc1) and 1 SIGSEGV: Invalid I shortened the crunching time from 4 to 1 h. 5.80: *** glibc detected *** corrupted double-linked list: 0x097ea480 *** and 1 SIGSEGV: Valid 5.69: resultid=116839781: Valid 5.69: resultid=116896290 1 SIGSEGV: Valid The '*** glibc detected *** corrupted double-linked list:' is an error in the app. One of the last(valid) WUs got stuck, so I shut down boinc, restarted and the WU was successfully finished. This host is doing work for Einstein(32Bit), ABC(64Bit), Seti(64Bit) and WCG(32Bit) without problems. cu, Michael [edit]format[/edit] |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Workunit 106397169 (trunc_cryb__BOINC_ABRELAX_-trunc_cryb_-_2238_78739_0) stuck at 0.751%. Intel iMac2: Mac OSX 10.4.10; Boinc 5.10.20. Aborting. |
Trey Send message Joined: 3 Oct 06 Posts: 11 Credit: 110,142 RAC: 0 |
I had a problem with 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0. I did just re-install my computer with openSUSE 10.3 (from 10.1) a few hours previous. However, WUs on the new O/S before/after the problem one seem OK. 2007-11-03 10:36:31 [---] Starting BOINC client version 5.10.21 for x86_64-pc-linux-gnu 2007-11-03 10:36:31 [---] log flags: task, file_xfer, sched_ops 2007-11-03 10:36:31 [---] Libraries: libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5 2007-11-03 10:36:31 [---] Executing as a daemon 2007-11-03 10:36:31 [---] Data directory: /home/trey/BOINC 2007-11-03 10:36:31 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ [Family 15 Model 43 Stepping 1] 2007-11-03 10:36:31 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_ legacy 2007-11-03 10:36:31 [---] OS: Linux: 2.6.22.9-0.4-default 2007-11-03 10:36:31 [---] Memory: 1.97 GB physical, 4.01 GB virtual 2007-11-03 10:36:31 [---] Disk: 98.44 GB total, 89.08 GB free 2007-11-03 10:36:31 [---] Local time is UTC -5 hours 2007-11-03 10:36:31 [rosetta@home] URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 647624; location: home; project prefs: default 2007-11-03 10:36:31 [---] General prefs: from http://www.worldcommunitygrid.org/ (last modified 2007-10-28 21:44:36) 2007-11-03 10:36:31 [---] Host location: home 2007-11-03 10:36:31 [---] General prefs: no separate prefs for home; using your defaults 2007-11-03 10:36:31 [---] Reading preferences override file 2007-11-03 10:36:31 [---] Preferences limit memory usage when active to 1007.34MB 2007-11-03 10:36:31 [---] Preferences limit memory usage when idle to 1813.21MB 2007-11-03 10:36:31 [---] Preferences limit disk usage to 1.86GB 2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 using rosetta version 569 2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 using rosetta version 569 2007-11-03 11:23:31 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 finished 2007-11-03 11:23:31 [rosetta@home] Starting 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 2007-11-03 11:23:31 [rosetta@home] Starting task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 using rosetta_beta version 580 2007-11-03 11:23:33 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0 2007-11-03 11:23:36 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0 2007-11-03 11:23:36 [rosetta@home] [file_xfer] Throughput 42065 bytes/sec 2007-11-03 12:04:45 [rosetta@home] Deferring communication for 1 min 0 sec 2007-11-03 12:04:45 [rosetta@home] Reason: Unrecoverable error for result 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 (process exited with code 1 (0x1, -255)) 2007-11-03 12:04:45 [rosetta@home] Computation for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 finished 2007-11-03 12:04:45 [rosetta@home] Output file 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0_0 for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 absent 2007-11-03 12:04:45 [rosetta@home] Starting 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 2007-11-03 12:04:45 [rosetta@home] Starting task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 using rosetta_beta version 580 2007-11-03 13:02:50 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 finished 2007-11-03 13:02:50 [rosetta@home] Starting 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0 2007-11-03 13:02:50 [rosetta@home] Starting task 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0 using rosetta_beta version 580 2007-11-03 13:02:52 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0 2007-11-03 13:02:55 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0 2007-11-03 13:02:55 [rosetta@home] [file_xfer] Throughput 43435 bytes/sec 2007-11-03 14:33:06 [rosetta@home] Computation for task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 finished 2007-11-03 14:33:06 [rosetta@home] Starting 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0 2007-11-03 14:33:06 [rosetta@home] Starting task 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0 using rosetta_beta version 580 2007-11-03 14:33:09 [rosetta@home] [file_xfer] Started upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0 2007-11-03 14:33:17 [rosetta@home] [file_xfer] Finished upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0 2007-11-03 14:33:17 [rosetta@home] [file_xfer] Throughput 7927 bytes/sec |
Eric Send message Joined: 20 Jan 06 Posts: 3 Credit: 47,910 RAC: 0 |
I have a computation error on 1n0u__TREEJUMP_ABRELAX_NOTOR-1n0u_-_BARCODE__2241_50881_1 This is a first for me. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I'm getting errors with exit from pose.cc in some TREEJUMP_ABRELAX WUs: https://boinc.bakerlab.org/rosetta/result.php?resultid=117501958 https://boinc.bakerlab.org/rosetta/result.php?resultid=117467416 https://boinc.bakerlab.org/rosetta/result.php?resultid=117368350 https://boinc.bakerlab.org/rosetta/result.php?resultid=117362156 https://boinc.bakerlab.org/rosetta/result.php?resultid=117342990 https://boinc.bakerlab.org/rosetta/result.php?resultid=117297062 |
Trey Send message Joined: 3 Oct 06 Posts: 11 Credit: 110,142 RAC: 0 |
I've had another failure on a different computer (running Windoze this time): 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8157_0 |
Message boards :
Number crunching :
Problems with Rosetta version 5.80
©2024 University of Washington
https://www.bakerlab.org