Message boards : Number crunching : Problems with Rosetta version 5.80
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
Dr Who Fan Send message Joined: 28 May 06 Posts: 80 Credit: 273,880 RAC: 100 |
several error to report: https://boinc.bakerlab.org/rosetta/result.php?resultid=110371301 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 3975811 == </stderr_txt> <message> <file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_4190_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Validate state Invalid ------------- https://boinc.bakerlab.org/rosetta/result.php?resultid=110418515 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 3948666 ====================================================== DONE :: 1 starting structures 8134.02 cpu seconds This process generated 5 decoys from 5 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31335_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Validate state Invalid ------------- https://boinc.bakerlab.org/rosetta/result.php?resultid=110838918 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 1657484 == </stderr_txt> ]]> Validate state Invalid |
ziegenmelker Send message Joined: 26 Jul 06 Posts: 10 Credit: 26,061 RAC: 0 |
|
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
1bq9A_SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1bq9A-_BARCODE__2166_3994_0 Crashed after I opened the graphics window about 1 hour and 11 minutes in as it was initializing the second model result |
Oliver Send message Joined: 11 Oct 07 Posts: 4 Credit: 525 RAC: 0 |
Hi all: We're trying to track down several sources of error. Workuntis with the batch number 2155: sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED appear to be flawed. I've cancelled the job; you should also feel free to abort these jobs if you see them. There aren't that many. I just fixed the problem and sent out a similar job with ID 2163. thanks *very* much for posting! |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_65537_0 Exit status = -1073741819 (0xc0000005) Edit: Only a couple more edits and this might make sense :) |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
|
ziegenmelker Send message Joined: 26 Jul 06 Posts: 10 Credit: 26,061 RAC: 0 |
A valid WU, but still with errors: <core_client_version>5.10.8</core_client_version> <![CDATA[ <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 # random seed: 3647667 SIGSEGV: segmentation violation Stack trace (12 frames): [0x8d7cf2f] [0x8d77d1c] [0xffffe500] [0x8e024c7] [0x8dd2715] [0x8dd2481] [0x83f9b8b] [0x8de873f] [0x8d79987] [0x8d7afa5] [0x8d73f9d] [0x8e1487a] Exiting... Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 ERROR:: Exit from: fragments.cc line: 465 FILE_LOCK::unlock(): close failed.: Bad file descriptor *** glibc detected *** double free or corruption (fasttop): 0x0909e348 *** Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 *** glibc detected *** corrupted double-linked list: 0x09757f20 *** Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 *** glibc detected *** corrupted double-linked list: 0x09511408 *** Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 ====================================================== DONE :: 1 starting structures 14211.6 cpu seconds This process generated 19 decoys from 19 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> I really wonder about all these SIGSEGV errors. I don't think they are hardware related. "glibc detected *** corrupted double-linked list" should be caused from the app itself. System: AMD 64 X2 4400, 2Gig, standard clock, 64-Bit OpenSUSE 10.2, glibc-2.5-25. cu, Michael |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 345 |
I believe there is a problem with 2168 wus. They are pruducing to many decoys. To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host. Found another one: I don't believe the owner knows anything about this as all the WU's making the huge claims all have the same name, they all start with mcr1, for example mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-short_mfr__2168_67176_0 On the other side of the coin the same computer is getting a lot of stuck WU's that are being terminated by the Watchdog that start with STM0082_BOINC_MFR_ABRELAX_PICKED_ and they get just 20 credits for each of these WU's. |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
[quote]I believe there is a problem with 2168 wus. They are pruducing to many decoys. To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host. That is the same thread my 2168 points to. Jmarks |
Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0 |
The machines affected by this "multi-decoy" bug seem all to be Core2Quads. And, contrary to what the 4000 credit WU thread implies, also other WU types than mcr1 are affected, see these two: https://boinc.bakerlab.org/rosetta/result.php?resultid=111755669 https://boinc.bakerlab.org/rosetta/result.php?resultid=111753474 "I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon |
M.L. Send message Joined: 21 Nov 06 Posts: 182 Credit: 180,462 RAC: 0 |
Result ID 111953379 Name HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_55153_0 Workunit 101758314 Created 11 Oct 2007 16:10:04 UTC Sent 11 Oct 2007 16:12:47 UTC Received 14 Oct 2007 2:04:59 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 510574 Report deadline 21 Oct 2007 16:12:47 UTC CPU time 9247.8125 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 1886848 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00B77A52 write attempt to address 0x1FF63C2C Engaging BOINC Windows Runtime Debugger... ******************** |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 1,448 |
Here's one that didn't work out: Result ID 112315759 Name STM0082_BOINC_MFR_ABRELAX_PICKED_2175_1860_0 Workunit 102095676 Created 13 Oct 2007 5:23:55 UTC Sent 13 Oct 2007 5:25:15 UTC Received 14 Oct 2007 18:15:54 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 307276 Report deadline 23 Oct 2007 5:25:15 UTC CPU time 3577.953125 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 3485771 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score 0 for 900 seconds ********************************************************************** GZIP SILENT FILE: .aaSTM1.out </stderr_txt> ]]> Validate state Valid Claimed credit 9.24557159531191 Granted credit 20 application version 5.80 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Result ID 110380482 Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 Workunit 100312279 client error and compute error CPU time 21566.46875 stderr out <core_client_version>5.10.20</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 3967126 ====================================================== DONE :: 1 starting structures 21565.9 cpu seconds This process generated 10 decoys from 10 attempts ====================================================== ================================================================================= from BOINC Manager time is CET (gmt+2) 10/13/2007 1:57:01 AM|rosetta@home|Computation for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 finished 10/13/2007 1:57:01 AM|rosetta@home|Output file sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0 for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 absent 10/13/2007 1:57:01 AM|rosetta@home|Starting CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0 10/13/2007 1:57:01 AM|rosetta@home|Starting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0 using rosetta version 569 10/13/2007 1:57:02 AM|rosetta@home|Deferring communication for 1 min 0 sec 10/13/2007 1:57:02 AM|rosetta@home|Reason: Unrecoverable error for result sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 (<file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
Result ID 110380482 For some reason this strikes me as very funny. Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
doh! it got lost in the clutter of the rest of my work. I don't look at my work unit list that much so let that one slip through. thanks for the reminder. have to go hunt for the rest if any. Result ID 110380482 |
Bletchley Park Send message Joined: 4 Oct 07 Posts: 4 Credit: 18,052 RAC: 0 |
Version 5.80 BETA computation error unknown software exception 0xc0000409 occurred in the application at 0x00c2ec4a lc26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_16844 using a lot of system cpu time. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
the scaling for this work unit is way off in the graphs mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_72362_0 -275 accepted energy and lower is going down out of the window. the rmsd is often out the left of its window. I can't get any screenshots of this for some reason, but you guys know what i mean. |
Z3r0 Send message Joined: 3 Aug 06 Posts: 2 Credit: 8,453 RAC: 0 |
I have 5.10.20 boinc and I have just used Rosetta since last week. It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause? |
Z3r0 Send message Joined: 3 Aug 06 Posts: 2 Credit: 8,453 RAC: 0 |
I have 5.10.20 boinc and I have just used Rosetta since last week. It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause? 2007-10-17 13:13:01 [rosetta@home] Deferring communication for 1 min 11 sec 2007-10-17 13:13:01 [rosetta@home] Reason: Unrecoverable error for result 1opd__SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1opd_-_BARCODE__2166_6199_0 (Incorrect function. (0x1) - exit code 1 (0x1)) |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
I have 5.10.20 boinc and I have just used Rosetta since last week. In general Preferenced make sure the- Leave applications in memory while suspended? (suspended applications will consume swap space if 'yes') yes Jmarks |
Message boards :
Number crunching :
Problems with Rosetta version 5.80
©2024 University of Washington
https://www.bakerlab.org