Message boards : Number crunching : Minirosetta v1.40 bug thread
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 15 · Next
Author | Message |
---|---|
Christian Send message Joined: 11 Jun 06 Posts: 1 Credit: 215,203 RAC: 0 |
ALLCON, For some reason Minirosetta v1.40 continues to lock-up my machine. This condition has existed for about two weeks now. It gets a little exasperating when I need my machine to do something and have to reboot it... consequentially I will be discontinuing running BOINC and Rosetta until someone cleans up the problem, nevermind the CPU tasking! Hardware stats: EVGA nForce 590 SLI mobo AMD Athlon 64 x2 5000+ Corsair XMS 4 gb ram (2x2gb) EVGA/Nvidia 8800gt (512gb x 2 in SLI) BFG 650w PS etc... Software stats: Win XP sp2 (up to date) Trend Micro IS 2009 (up to date) This is a fairly new build (6 months) and has very little in the way of garbage on it. I've never had a problem with BOINC or Rosetta before these past few weeks with the introduction of Minirosetta v1.40. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
I too, have 8 WU's of Mini 1.40 in progress for 15+ hours and stuck at above 98% completion, and still showing 9 hours 57 minutes left to completition. In fact the time-to-completion hasn't changed in over 10 hours. Are you sure that isn't 9 minutes 57 seconds to go? Rosetta@home workunits tend to stick at about that estimated time to go if they come with a serious underestimate of how much CPU time they need, until they finally reach a time when the actual time to go is less than that. I've read of some of these workunits needing about 800 MB to run well, but if any one core on your machine can get this much, suspending jobs on other cores won't help it run any faster. Exception: If some of the reported cores on your machine are due to hyperthreading, telling it to use only as many cores as are available without hyperthreading often at least doubles the speed on that number of cores. I've seen one of these workunits that seemed to get stuck actually take about 19.5 hours CPU time, but it seemed to complete normally otherwise. It got a bad credit granted to credit requested ratio, though. Adjusting my settings so that BOINC is allowed to use more than the default of about 10 GB of disk space seemed to help my more recent jobs, though. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
ALLCON, Christian, Who or what is ALLCON? Is that a 32-bit version of Windows XP SP2, which is unlikely to be able to actually use more than about 3.5 GB of your RAM memory, or a 64-bit version, which can use more of it? When I had a similar problem on my machine, I found that it was helpful to tell BOINC that it could make use of more of my disk space than the default of about 10 GB; I had significantly more free disk space than that. |
Martin Johnson Send message Joined: 18 Oct 05 Posts: 19 Credit: 171,164 RAC: 0 |
1.4 units WILL NOT STOP / wait / suspend. So my Rosetta RAC is rising, and the others are falling !!! |
Sarel Send message Joined: 11 May 06 Posts: 51 Credit: 81,712 RAC: 0 |
Sorry for being away for a while. I was busy in the wet lab testing some of my older designs (some of which show promise! when I get verification on this, I'll post an update on the protein-protein interactions thread). Using the information that you posted on this thread I've been able to reproduce on the lab's machines the long run-time problems that you have reported. I now have a good idea about how to avoid such occurrences in the future so that future runs will not be poorly behaved. Also, we have found a way for lowering the memory signature of our runs, but for at least a while, we'll keep the current 512Mb restriction, just in case. We will probably submit more protein-interface jobs to boinc over the next week or so and I will look for your messages to see whether we've completely resolved this issue. So, I'm planning to sift through the 500 thousand designed models that you have produced over the next few days and am extremely excited about seeing all these new possibilities! |
Martin Johnson Send message Joined: 18 Oct 05 Posts: 19 Credit: 171,164 RAC: 0 |
What about this "refusal to stop" issue ? |
Jim_Clark Send message Joined: 11 Sep 07 Posts: 7 Credit: 38,439 RAC: 0 |
On my AMD Athlon 64 X2 Dual Core with Windows XP Pro SP3, and with 2 GB RAM and 100 GB available HD, no Rosetta Mini WU of any version has ever completed successrfully since Rosetta Mini came into existence. . They fail with a compute error or sometimes lockup my computer after wasting time that could be applied to WUs that can complete OK. So I abort all Rosetta Mini WUs until I finally get a Rosetta Beta WU. . This is a lot of work, since I generally need to abort about 30 or more Rosetta Mini WUs to get one Rosetta Beta WU. . About once a week, I allow one Rosetta Mini WU to run, to see if the problem is fixed yet -- which hasn't happened yet. Other project sites such as World Community Grid and PrimeGrid allow me to choose which applications my computer will run. . Why can't Rosetta provide this feature, too? . I would like to run the Rosetta Beta WUs, but if I get tired of aborting hundreds of Rosetta Mini WUs, I may feel forced to abandon Rosetta altogether. |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
This isn't a bug. Is there away to delete old database files without the project re downloading them once you restarted boinc? I have database rev 23035 25/6, 23035 7/8 & 25538 11/11 in total 54.8MB, do I need them all Thanks for any advice Have a crunching good day!! |
Alec Rosa Send message Joined: 11 Nov 08 Posts: 18 Credit: 2,635 RAC: 0 |
What about this "refusal to stop" issue ? Yes, and what about the recurring "exited with zero status but no 'finished' file" issue? |
Alec Rosa Send message Joined: 11 Nov 08 Posts: 18 Credit: 2,635 RAC: 0 |
On my AMD Athlon 64 X2 Dual Core with Windows XP Pro SP3, and with 2 GB RAM and 100 GB available HD, no Rosetta Mini WU of any version has ever completed successrfully since Rosetta Mini came into existence. . They fail with a compute error or sometimes lockup my computer after wasting time that could be applied to WUs that can complete OK. Well, well, it seems I'm not alone here. |
Martin Johnson Send message Joined: 18 Oct 05 Posts: 19 Credit: 171,164 RAC: 0 |
No, you're not. But no one has yet admitted this is a problem, so I too will be forced to abort until it is settled. |
Alec Rosa Send message Joined: 11 Nov 08 Posts: 18 Credit: 2,635 RAC: 0 |
I generally need to abort about 30 or more Rosetta Mini WUs to get one Rosetta Beta WU. . About once a week, I allow one Rosetta Mini WU to run, to see if the problem is fixed yet -- which hasn't happened yet. Well, well, it seems I'm not alone here. No, you're not. But no one has yet admitted this is a problem, Abort. Abort. Abort. I too am starting to feel like an abort-robot. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
A workunit that refused to give up its CPU core when its timeslot ended and it was time for a workunit from a different BOINC project to take over that CPU core: 11/20/2008 10:46:25 PM|rosetta@home|Starting loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t326__olange_IGNORE_THE_REST_2GHRA_8_4830_404_0 11/20/2008 10:46:26 PM|rosetta@home|Starting task loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t326__olange_IGNORE_THE_REST_2GHRA_8_4830_404_0 using minirosetta version 140 I've now told the BOINC interface to suspend both that task and the whole Rosetta@home project, but it's still taking about half the CPU time on that CPU CORE. I'm using the leave-in-memory option, in case that matters. The BOINC version is 5.10.45, under 32-bit Windows Vista SP1. |
A Few Good Men Send message Joined: 25 Mar 07 Posts: 14 Credit: 2,031,382 RAC: 0 |
72 hours of crunching on Qx6700 with Boinc 6.2.19 and 1.40mini and I have recieved 25 credits. "Computational errors" I have reset the project 3 times. |
droople Send message Joined: 19 Aug 08 Posts: 18 Credit: 3,348,943 RAC: 30 |
Hi I ran a minirosetta and got an error message as follows https://boinc.bakerlab.org/rosetta/result.php?resultid=206624050 t <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 1 starting structures 3.59375 cpu seconds This process generated 0 decoys from 0 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>1hzh_1xk5_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_206_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> I did check keep application in memory while preempted, but got this error any idea? Cheers |
Daniel Kohn Send message Joined: 30 Dec 05 Posts: 18 Credit: 2,899,939 RAC: 0 |
I noticed the other day that I "Snoozed" BOINC and one of my 2 Rosetta work-units keept crunching anyway. |
craig_bye Send message Joined: 30 Nov 06 Posts: 1 Credit: 84,909 RAC: 0 |
I too keep seeing an issue that the Rosetta Mini 1.40 just keeps running although BOINC reports it as "Waiting to Run". I've seen this twice now and I end up having to kill off the minirosetta_1.40_windows_intelx86.exe process. |
sarha1 Send message Joined: 23 Sep 05 Posts: 5 Credit: 6,339,735 RAC: 0 |
Really, "loopbuild_" WUs seem to ignore all the requests to suspend and use the full CPU. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 274 |
Crashed out wu's. 208927642 Watchdog after 20,991 seconds? 208802490 after 16,202 seconds NAN in HBonding. 208717837 after 7,255 seconds NAN in HBonding. Machines are all set to 6 Hour wu time. Leave in memory. Core Client 6.2.19. All "loopbuild_....." wu's - man they have long names. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
|
Message boards :
Number crunching :
Minirosetta v1.40 bug thread
©2024 University of Washington
https://www.bakerlab.org