Message boards : Number crunching : Rosetta@Home Version 3.24
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Yank Send message Joined: 18 Apr 06 Posts: 71 Credit: 1,752,514 RAC: 0 |
Out of about 60 work units 5 have "computer error". They ran any where from 1,000 to 3,000 CPU seconds. Is this about normal for Rosetta. Not too concerned, just asking. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
<message> Maximum disk usage exceeded </message> It seems unlikely I'm the only one seeing this. My cpu preferred run time is currently 8 hours but this one maxed out in just over 2 hours. CASP9_bq_benchmark_hybridization_run43_T0617_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45316_32_0 Best, Snags |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
why are tasks with the names similar to this rb_03_20_29193_58732__t000__SAVE_ALL_OUT_IGNORE_THE_REST_45388_2181_0 crashing on my system? I have not changed any settings and can run everything else just fine other than CASP9. I get the error: Exit status -1073741819 (0xffffffffc0000005) Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00E82D90 read attempt to address 0x1E5D5000 the wingman ran this just fine. he's running an opteron cpu. i hardly have any OC on my cpu, so is this just a touchy work unit or what? |
Dirk Broer Send message Joined: 16 Nov 05 Posts: 22 Credit: 3,404,708 RAC: 1,336 |
bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 3 5.69 4.151 0.009 5.717 4.151 0.007 18.3 7.3 0.984 ERROR: Fatal SOGFunc_Impl error. ?? Running on a P4-3200 (Prescott), WinXP 32-bit |
Louis A. Hatton Send message Joined: 5 Oct 05 Posts: 2 Credit: 47,821 RAC: 0 |
Rosetta@Home has been updated to version 3.24. If you encounter any problems, please let us know. Thank you for your continued support. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. This is my first error with the new app, it ran for just over 3hrs. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=449462958 CASP9_bq_benchmark_hybridization_run43_T0550_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45178_168_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 SIGSEGV: segmentation violation Stack trace (15 frames): [0xa858057] [0xf7773400] [0xa09e4cf] [0xa09f0ec] [0x9d2da83] [0x9288587] [0x8a3bbb0] [0x93c09a0] [0x93c336a] [0x954a627] [0x95b06f5] [0x95adf25] [0x80547ed] [0xa8e7f78] [0x8048131] Exiting... </stderr_txt> ]]> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Another one erred, lost over 3hrs work. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=449624823 CASP9_bq_benchmark_hybridization_run43_T0518_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45099_1038_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 SIGSEGV: segmentation violation Stack trace (23 frames): [0xa858057] [0xf77ec400] [0xa55e3f3] [0xa36da78] [0xa55c6b1] [0xa36d66d] [0xa36d448] [0x9b2b581] [0x9a01564] [0x9d26907] [0x99193e3] [0x9d729f0] [0x9d6c74e] [0x928aae8] [0x8a3bbb0] [0x93c09a0] [0x93c336a] [0x954a627] [0x95b06f5] [0x95adf25] [0x80547ed] [0xa8e7f78] [0x8048131] Exiting... </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
whats with all the errors in CASP9 and rb_03_xxxxxx? if it begins with one of these names it errors out on my system. bugs, bugs and more bugs. thought RALPH was supposed to find these problems and let you know before you release them here?!! |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I've got two erred task, 1 ran 11sec the other 9sec same error. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=451344754 2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1325_0 ===================================================================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=451406648 2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1843_0 BOINC:: Worker startup. Starting watchdog... Watchdog active. ERROR: Unable to set up interface foldtree because there are no movable jumps ERROR:: Exit from: src/protocols/docking/util.cc line: 289 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
whats with all the errors in CASP9 and rb_03_xxxxxx? seems that these tasks are sensitive to OC. Even the little bit I had going on was causing them to fail. turned it off. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,790,281 RAC: 4,437 |
495343739 Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev47790.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/if3dimer_design10monomer_fold_data.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 # cpu_run_time_pref: 7200 Starting work on structure: _00002 </stderr_txt> ]]> Validate state Invalid |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
CASP9_bs_benchmark_hybridization_run45_T0581_2_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45708_285 Both copies failed. On my Mac: Maximum disk usage exceeded after a cpu time of 13288.65, nan is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range On the second machine, a Windows machine on which the workunit failed much faster (1963.429 sec): Incorrect function. (0x1) - exit code 1 (0x1) Hbond tripped: [2012- 4- 1 0:15:53:] bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 1 7.725 4.214 1 ERROR: Fatal SOGFunc_Impl error. ERROR:: Exit from: ......srccorescoringconstraintsSOGFunc_Impl.cc line: 181 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
CASP9 tasks stink! I can't run them, but my wingman can. I shut down any overclocking I had on and they still die on me similar to Snagles post. I get this Exit status -1073741819 (0xffffffffc0000005) as the error. I looked through all the goblygook error info and find something about an execution delay. I bombed 3 tasks in a row in just over 24hrs now. Gets kind of old. With no remarks from the team about this problem why should I donate more time? So I can crash more tasks? |
DmGun Send message Joined: 21 Nov 10 Posts: 6 Credit: 706,645 RAC: 0 |
> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it." This is already fixed? |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it." Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions. We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version. It sucks, but unfortunately it's the position we're stuck with for the foreseeable future. |
DmGun Send message Joined: 21 Nov 10 Posts: 6 Credit: 706,645 RAC: 0 |
It sucks, but unfortunately it's the position we're stuck with for the foreseeable future. It is a pity. We'll have to go to the F@H... |
b1llyb0y Send message Joined: 16 May 11 Posts: 7 Credit: 4,142,933 RAC: 14 |
Maybe you could solve the problem by not sending those particular work units to Mac's? Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions. We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version. It sucks, but unfortunately it's the position we're stuck with for the foreseeable future. [/quote] |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
Maybe you could solve the problem by not sending those particular work units to Mac's? Unfortunately, it's an application-compilation-level problem, rather than a workunit-level problem, so it affects all workunits. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Rocco, Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem. What is going on? Also the tasks with RB03 were having troubles on my system. |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
Rocco, My understanding is that there is an edge case on some of the runs with the very new hybridize protocol (which are mainly being sent out as CASP9 and rb_ runs) which result in numerical instability and range errors in calculations for a substantial fraction of workunits for particular protein systems. The crashes were happening somewhat randomly, so it makes sense that the next person on the same workunit could complete it fine. The issue should hopefully be fixed in the new version of Rosetta@home we are currently testing on Ralph@home. |
Message boards :
Number crunching :
Rosetta@Home Version 3.24
©2024 University of Washington
https://www.bakerlab.org