Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next
Author | Message |
---|---|
rembertw Send message Joined: 21 Apr 07 Posts: 14 Credit: 628,529 RAC: 0 |
Mod.Sense And am only recommending a change to BOINC version because problems are occurring with the version installed now. I set up Boinc 6.4.5 on that computer, and it seems to be running fine with Rosetta. I still will wait for a general upgrade until there are new Boinc versions, I think. robertmiles "Current" is for me the version that the actual Boinc site gives as standard. Researching older versions and installing those is too much micromanagement for me. Same like posting on the boards... If this problem gets solved with 6.4.5 (and it seems to be solved) then I'm off again. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Hi: Looks like all of these were the ss-neg-1i17s that most people have been having trouble with. Something specific to the 1i17, the other ss-neg's do not seem to be having any trouble. Except for your last one on the list, it got a "Too many restarts with no progress. Keep application in memory while preempted." error. Perhaps you rebooted your machine several times in a row to install fixes or something? Rosetta Moderator: Mod.Sense |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
-161 error on 230728890 |
RodrigoPS Send message Joined: 28 Nov 08 Posts: 3 Credit: 1,336,719 RAC: 9 |
I noticed that with the minirosetta 1.54 the granted credit was very low in the Athlon X2 processors - sometimes half the claimed credit. This did not occur with the single core Athlon. |
RodrigoPS Send message Joined: 28 Nov 08 Posts: 3 Credit: 1,336,719 RAC: 9 |
I noticed that with the minirosetta 1.54 the granted credit was very low in the Athlon X2 processors - sometimes half the claimed credit. This did not occur with the single core Athlon. Problem solved. Updating the BIOS (F8> F9) of the motherboard caused a considerable loss of performance of PCs with Athlon X2 processors. The restoration of BIOS F8 normalized the system. |
Mike* Send message Joined: 16 Feb 09 Posts: 5 Credit: 102,030 RAC: 0 |
Hi all, Had the below error show up. I initially DLd 3 WU, the first 2 bombed, I aborted the 3rd.. I then detached, re-attached, then DLed 11 new ones. Every one of them went south.. Boinc mgr is 6.2.18 Free disk is 88g Used by boinc is 4.81 Use at most 100g Leave 0 Use up to 50% disk Leave apps in memory. Only other project (which was suspended was CPDN at 55% @1004 hrs (do not want to loose this) My host is 1008545 (should be viewable) At this point, I will wait till next week (SIMAP starting soon with it's monthly run :)) and will try again. Don't want to keep trashing WUs for no reason. I do have the messages from boinc stored if they would be useful, but here is one thing I see, but it may only be due to the process crashing: 2/26/2009 8:04:04 PM|rosetta@home|Starting lr8_A_score12_rlbd_2ci2_IGNORE_THE_REST_DECOY_SAVE_ALL_OUT_7089_1093_0 2/26/2009 8:04:05 PM|rosetta@home|Starting task lr8_A_score12_rlbd_2ci2_IGNORE_THE_REST_DECOY_SAVE_ALL_OUT_7089_1093_0 using minirosetta version 154 2/26/2009 8:04:19 PM|rosetta@home|Computation for task lr8_A_score12_rlbd_2ci2_IGNORE_THE_REST_DECOY_SAVE_ALL_OUT_7089_1093_0 finished 2/26/2009 8:04:19 PM|rosetta@home|Output file lr8_A_score12_rlbd_2ci2_IGNORE_THE_REST_DECOY_SAVE_ALL_OUT_7089_1093_0_0 for task lr8_A_score12_rlbd_2ci2_IGNORE_THE_REST_DECOY_SAVE_ALL_OUT_7089_1093_0 absent Thanks mike (extra blank lines removed) <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 2-26 20:10: 2:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C910193 write attempt to address 0x009882EA Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C910193 write attempt to address 0x0040118E Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Hi all, A few questions that may help pin down the problem: Are you able to find BOINC 6.2.28, and willing to upgrade to it? That's the only version I have used since 5.10.45, and I don't have that problem. Have you gone to any extra effort to tell BOINC that it could use more virtual memory than the default? Have you gone to any extra effort to tell your copy of Windows to allow a bigger swap file than the default? How many BOINC projects do you have your BOINC Manager set up to recognize? I've seen some so far rather indistinct signs that BOINC divides the disk space it is allowed to use into equal sections for each BOINC project it recognizes before it starts dividing those sections into smaller subsections for each workunit. Therefore, if one BOINC project is heavy on disk space use, workunits for that project might run out of disk space even if some other BOINC project doesn't need all that is reserved for it. Does this site tell you how much memory your machine has now and what the maximum for that model of computer is? http://www.crucial.com/ I had problems getting my dual-core CPU to run two Rosetta@home workunits at the same time back when I had only 1 GB of memory to share between Vista and the two workunits, so I ordered an upgrade to the 2 GB maximum my model of computer can handle; now I can run two such workunits at once even while typing this. |
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
Hi: Right, last was multifix from our "love" Microsoft.... |
Mike* Send message Joined: 16 Feb 09 Posts: 5 Credit: 102,030 RAC: 0 |
Hi all, The odd thing is that I had successfully finished 3 models a few days ago, and a couple before that, (cant remember the version off hand, only 1 wu at a time) with no issues. I am attached to 7 projects but am not running then all. (I NNT the projects, and have a small buffer so as to not have to worry about having too much (Yea, I know boinc manages it, but I want to make sure everything gets doone quickly). When you mentioned boinc dividing the disk space, I am wondering if I had the non active projects suspended, which I ususally have done in the past.. I will retry after I get thru the SIMAP run (this is why I keep the tasks low), making sure my buffer is small so as hopefully not grab 11 tasks Thanks Mike |
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
Another bug: https://boinc.bakerlab.org/rosetta/result.php?resultid=231152575 loopbuild_reference_allmodels_hb_t360 |
rembertw Send message Joined: 21 Apr 07 Posts: 14 Credit: 628,529 RAC: 0 |
[Mod.Sense] I still have not seen anyone else reporting such a problem, and you've got a score of other hosts running fine. Last update: everything seems to be ok after I updated the Boinc version to 6.4.5. The exact reason for the 0% progress with Mini Rosetta is still a mystery but at least that computer is crunching again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Hi all, Another question that may help pin down the problem: Did you have graphics enabled at any time during those runs? When I run minirosetta 1.58 for RALPH@home, it completes successfully if I never enable graphics, but fails if I have graphics enabled for a short time during the run. |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
Another bunch of Hbond tripped errors: hw_mamaln_t290_3_hb_1xyh__IGNORE_THE_REST_1ihg_1_SAVE_ALL_OUT_7736_375_0 hw_mamaln_t290_3_hb_1ihg__IGNORE_THE_REST_1cyn_1_SAVE_ALL_OUT_7729_256_0 hw_mamaln_t290_3_hb_t290__IGNORE_THE_REST_1zkc_1_SAVE_ALL_OUT_7743_255_0 hw_mamaln_t290_3_hb_t290__IGNORE_THE_REST_1xwn_1_SAVE_ALL_OUT_7743_255_0 First three of them have valid status and: ERROR: dis==0 in pairtermderiv! ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338 called boinc_finish |
Mike* Send message Joined: 16 Feb 09 Posts: 5 Credit: 102,030 RAC: 0 |
Hi all, No, did not have the graphics running, the process crashed immediatly upon startup (or at least within a few seconds). Interesting thing.. Normally I only have 1 to 3 projects un-suspended at 1 time. I has more than that un-suspended, but No new tasks.. I suspended ALL projects, shut down, and re-booted. Started up boinc, set to not keep projects in memory, 50% cpu (us the 1 core non HT, unsuspended Rossetta, said give me tasks, hit update. Gave me 6 and then let it do its thing.. Guess what.. no issues.. I suspended 5 of the tasks to let the 1 run. I also re-adjusted to 100% to use HT, re-started Docking, and had several Docking and 1 Rosetta finish.. Might be due to allocating memory among the active projects.. Am wondering if any of the other bugs I saw here, is the same issue with too many "active projects". The programmer in me is suspecting that.. Not knowing what goes on in Boinc, etc could not tell (Besides, don't do C++ or later). Thanks for the 'insight".. Mike p.s. added answer on graphics and spellings. |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
Very long WU (25000 seconds), probably ended by timeout (intended runtime + 4 hours): wt_ub_BOINC_ABRELAX_3MERS_NOHOMS_t482_SAVE_ALL_OUT_IGNORE_THE_REST-S25-3-S3-3--wt_ub-_7707_42783_0 It slows down on about 90% and I see in graphics that for about 4 hours it do SmallMoverMoverBase+Minimization stage And it's also a Hbond tripped result :( |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
Very long WU (25000 seconds), probably ended by timeout (intended runtime + 4 hours): This one is interesting as it was completed successfully by a second computer in less than half the time and both were run on Linux machines. |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
Very long WU (25000 seconds), probably ended by timeout (intended runtime + 4 hours): Maybe it's because I have a 64-bit Linux? |
root Send message Joined: 16 Feb 09 Posts: 6 Credit: 24,387 RAC: 0 |
I'm getting this same error for nearly all WUs on two Linux boxes running FC8 and FC9 with kernel 2.6.23.1-42.fc8 and 2.6.25.14-108.fc9.x86_64; resp. In addition, I have a third Linux laptop running FC9 with no problems whatsoever. All 3 machines are running with leave_apps_in_memory=0. <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> Any ideas? |
hedera Send message Joined: 15 Jul 06 Posts: 76 Credit: 5,263,150 RAC: 144 |
I've had 2 Windows error messages in the last couple of days from Rosetta. This is on a Win XP Pro SP2 system. The last one was this morning. I looked at my results today and this WU has crashed at 15:13:50 UTC: 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431 Checking my message log, I found these messages: 03/03/2009 6:00:54 AM|rosetta@home|Restarting task 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431_2 using rosetta_beta version 598 03/03/2009 6:01:41 AM|rosetta@home|Task 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431_2 exited with zero status but no 'finished' file 03/03/2009 6:01:41 AM|rosetta@home|If this happens repeatedly you may need to reset the project. Identical messages repeated until 7:12 AM when I got this: 03/03/2009 7:12:14 AM|rosetta@home|Computation for task 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431_2 finished 03/03/2009 7:12:14 AM|rosetta@home|Output file 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431_2_0 for task 2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_431_2 absent If you look at the task details for WU 209583003 on computer 272841, you'll see this error followed by a dump: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 2834914 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x008BB955 read attempt to address 0x09A9C000 Engaging BOINC Windows Runtime Debugger... ******************** I'm sure it isn't meant to do this... --hedera Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
I've had 2 Windows error messages in the last couple of days from Rosetta. This is on a Win XP Pro SP2 system. The last one was this morning. I looked at my results today and this WU has crashed at 15:13:50 UTC: Could you check the results uploaded for this one and see it the results include any mention of lockfile problems? Also, a few questions that may help pin down the problem: 1. Do the error messages shown above repeat several times, and do the lockfile error messages if any repeat several times? 2. What version of BOINC are you using? 3. Have you enabled the leave in memory option? 4. What percentage of CPU time do you let BOINC projects use? The 60% setting typical for laptops, the 100% setting typical for desktops, or something else? 5. Did this workunit start with graphics enabled? Did you enable graphics later? Did you then shut down graphics for it? |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org