Message boards : Number crunching : Problems with Minirosetta 1.80
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I'm having a lot of errors from sel_core WUs. They crunch over 16 hours (4 hours over my 12 hour preference), then they exit claiming 1 decoy (although I suspect they didn't produce any decoys), then they error out with code -161 (file_xfer_error, probably because no decoys were generated). https://boinc.bakerlab.org/rosetta/result.php?resultid=264546035 https://boinc.bakerlab.org/rosetta/result.php?resultid=264511109 https://boinc.bakerlab.org/rosetta/result.php?resultid=264503575 https://boinc.bakerlab.org/rosetta/result.php?resultid=264490536 https://boinc.bakerlab.org/rosetta/result.php?resultid=264476503 https://boinc.bakerlab.org/rosetta/result.php?resultid=264456236 https://boinc.bakerlab.org/rosetta/result.php?resultid=264403527 https://boinc.bakerlab.org/rosetta/result.php?resultid=264394564 |
Jimmy McNulty Send message Joined: 13 Nov 05 Posts: 2 Credit: 74,396 RAC: 0 |
Just came back to this project after a few months break because I was previously having problems with every single WU. I've run a couple dozen in the past week or so with no problems, then got an error with WU lb_alnmatrix_within_2_hb_t370__IGNORE_THE_REST_1DNLA_12_13913_6_0 Ran 31 hours with preferance set for 8, didn't budge past 69.725% Additionally, I don't dare click show graphics on any work unit since i've returned to the project because 3 times I've gotten an error message for minirosetta 1.80 and progress stops. My computer keeps trying to crunch it but i'm forced to abort; however that was not the case with the WU I mentioned above. |
pandem Send message Joined: 12 Nov 08 Posts: 2 Credit: 1,043,807 RAC: 0 |
In reference to message 62095 [quote]Hi, I'm experiencing issues with 1.80 where by: 1)a WU does not exit memory. I currently have 25 minirosetta_1.80_windows_intelx86.exe processes in memory only 2 of which are using any cpu time. Memory utilization ranges from 400kb to 200mb The fact they are not exiting, is causing my virtual memory to run out. 2)I get error messages in the BOINC client. 3)The ...BOINCslots folder is filling up with numbered folders where most have only three files:boinc_lockfile, stderr.txt and stdout.txt.[/quoted] - has any one also noted this type of issue as I have? It very annoying and I have resorted to suspending the project. There is a setting in the global_prefs_override that says remove from memory when in use. It seems this setting has no affect on this project. minirosetta 1.81/1.82 (dualcore Intel,2.66G 2Gbyte, xp sp3, boinc 6.6.36) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
In reference to message 62095 1.81 and 1.82??? These are on Ralph. What you are reporting is new. Sounds like tasks are not completing properly. Rosetta Moderator: Mod.Sense |
bruce Send message Joined: 15 Sep 07 Posts: 10 Credit: 839,797 RAC: 0 |
In reference to message 62095 I've been experiencing this issue since before 1.67, so, new?, not to me... but perhaps something not seen before by most. I haven't tried RALPH, so I couldn't report any issues there. I've been experiencing this on 3 separate machines. All running into the same basic problem, where the Minirosetta application does not exit memory. I have a plethora of errored WUs. https://boinc.bakerlab.org/rosetta/results.php?userid=205458&offset=40 For the three machines I've been experiencing this on, because it drives memory usage into the ground and begin getting messages about running out of virtual memory, I've suspended the project on those machines, until I see some forward motion in resolution. I do continue to have R@H running on one machine that does not seem to have this same issue. Any ideas on what information I can supply that may help work towards a resolution? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Any ideas on what information I can supply that may help work towards a resolution? BOINC version, Rosetta version (which is shown in the tasks tab), computing platform (Windows edition, Linux, Mac), any information you might have about when the problem does and does not occur, whether you display the graphic, and whether you use BOINC as your screensaver, did the tasks complete normally and report back valid results with reasonable credit? Those are my general questions I'd always ask. Bruce, in your case, is there anything unique about these 3 machines are compared to any others that you have experience with that might explain why they see the problem and others do not?? I'm guessing BOINC version perhaps? Just so others are clear, in general, you would *prefer* that BOINC leave tasks in memory while preempted. It runs more efficiently that way. This is set up in the preferences. But what is being described here is tasks that are completed (i.e. not preempted) and are not leaving memory. And regardless of your preference, the program should free up all memory and BOINC slots when a task completes. Rosetta Moderator: Mod.Sense |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I'll keep this thread sticky until the remaining 1.80 work reaches the 10 day deadline. Rosetta Moderator: Mod.Sense |
bruce Send message Joined: 15 Sep 07 Posts: 10 Credit: 839,797 RAC: 0 |
Any ideas on what information I can supply that may help work towards a resolution? Hi, Here's some additional informaion on my situation, and while I describe the situation on one computer, the same situation exists on two others. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4953&nowrap=true#62095 These 4 computers are cabable of running only two tasks at a time; and I've had no problem with the applications staying in memory during their normal processing or waiting for resources/pausing during usage; but, There really isn't any reason why 25 WUs should remain in memory(yes, I've observed upwards of 25 jobs in memory, typically I'll begin to notice when around 10 rosetta jobs are in memory. My observation over the last few years has been that they do exit where there was an error, or they've completed running, and at any given time, I would only see two jobs running. I'm not refering to remaining in memory during the normal course of waiting for resources. This behavior is not normal.....that being said, here the information requested Some tasks complete normally soon after a reboot, but within a day or so I begin seeing errored WUs The errored WUs, no credit is given. Common to all 4 computers: BOINC: 6.6.36; Rosetta 1.80 All computers have all latest service packs and patches on operating systems and applications. swtich between apps every 200 minutes use at most 100% processors use at most 75% of CPU time use at most 20gb HD space use at most 50% memory when in use use at most 90% memory when idle Projects: rosetta@home (Resource Share:600); seti@home (Resource Share:75) Computer 1 (no observed errors) 2 AMD Opteron 250 processors 8 gb RAM Windows 7 RC (64bit) 4 146gb HDs (scsi) No screen saver active (Light usage:email, internet, etc) Computer 2 3.0ghz Pentium 4 (w/hyperthreading on) 2.0gb RAM WinXP Pro (32bit) 1 76gb HD(sata), 1 160gb HD (SATA) no screen saver active (almost no usage (runs boinc and tomcat only)) Computer 3 Dell Latitude D830 T7500 Duo Core2 2gb RAM Windows XP Pro (32bit) 1 80gb HD Boinc screen saver active and displays graphics (heavy usage) Computer 4 Compaq/HP CQ60-215DX AMD Athlon Dual-Core QL-62 2.0ghz 2gb RAM Windows Vista Home Premium (32bit) 250gb HD non-boinc screen saver - no boinc/rosetta graphics (light usage: internet, email) |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
Compute error after 101,115 seconds (1 decoy): https://boinc.bakerlab.org/rosetta/result.php?resultid=265225509 <file_xfer_error> <file_name>sel_core_4.5_low200_beta_low200_start_hb_t374__IGNORE_THE_REST_14057_526_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
|
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Any ideas on what information I can supply that may help work towards a resolution? Bruce, Rosetta@home is known to have problems running properly when the CPU time percentage is set to less than 100%. Usually shows up as a lockfile problem. Both my desktop computers seem to run properly with the CPU percentage set to 100%, since programs you start yourself normally get higher priority than BOINC workunits. However, I currently have one of them set to use only 95%, to help track down the lockfile problems. Laptops and some computers with poor cooling cannot use 100% without overheating, though. |
Message boards :
Number crunching :
Problems with Minirosetta 1.80
©2025 University of Washington
https://www.bakerlab.org