Message boards : Number crunching : Problems with Minirosetta Version 1.67
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
I've experienced 10 compute errors out of the last 22 tasks -- not a good track record. I think this is the most errors I've ever had with a particular version of Rosetta or perhaps it is the tasks themselves. |
CharlyD Send message Joined: 1 Dec 06 Posts: 5 Credit: 135,227 RAC: 0 |
I also had two errors in the three last WUs... |
Christopher Woods Send message Joined: 13 May 07 Posts: 2 Credit: 43,235 RAC: 0 |
1.67 is also still triggering Kaspersky Internet Security's detection, (and being automatically moved into the Untrusted apps category, it cannot be loaded by BOINC and just exits with 0 status constantly). You have to manually reclassify the minirosetta executable every day but it continues to occur. This has been noted as happening for almost a year; how come the Rosetta exe triggers KIS but none of the other BOINC projects' crunchers do? (I run ClimatePrediction, SETI, Predictor @ Home and a bunch of others, and they all play nice) It's frustrating to wake up and see KIS just constantly notifying me of blocked launch attempts :( waste of good CPU cycles... |
Hammeh Send message Joined: 11 Nov 08 Posts: 63 Credit: 211,283 RAC: 0 |
I am also getting no screen graphics on all my Mini 1.67 WU's that begin with pp_ |
HW&JC Send message Joined: 2 May 08 Posts: 21 Credit: 7,957,412 RAC: 2,913 |
1.67 is also still triggering Kaspersky Internet Security's detection, (and being automatically moved into the Untrusted apps category, it cannot be loaded by BOINC and just exits with 0 status constantly). My sympathies. Same for Norton Internet Security 2009 as posted by me at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4876&nowrap=true#61205 As I said there, is there no way of getting both Symantec and Kaspersky on board with Mini Rosetta to whitelist it? How many people are running away from Rosetta because all the WUs abort? Maybe we'll never know :( |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 274 |
Speedfan shows I am using only about 75% of one of my systems, (quad core), when I suspend the miniRosetta 1.67 wu, it goes back to 100%, (crunching Einstein and CPDN), and yet, the Rosetta task appears to be running? I'll look at this some more in the morning. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Cesium_133* Send message Joined: 1 Dec 08 Posts: 28 Credit: 225,332 RAC: 0 |
I aborted these two projects: threading_lb_test1_hb_t303__IGNORE_THE_REST_11824_2251_0; state 5 threading_lb_test1_hb_t327__IGNORE_THE_REST_11836_2238_1; state 5 I think they were both on 1.67... but in any case, they were at about 10% and 37%, respectively, and apparently stopped computing. I updated, suspended and restarted, and basically did everything but a reach-around. They just were not computing, so I aborted and started the next 2 Rosetta tasks. No problem. I don't know, just relaying the info to the community. I haven't had many problems like this. The kicker, though, was when I tried looking at graphics on 1.67 Mini versus the advanced view, which is what I keep BOINC on most of the time, and nothing happened. It locked up and then closed. Nothing to be seen... Cesium... The lovely lady you see isn't I, but Hayley Westenra, a classical crossover singer from Christchurch, NZ. There is no known voice as hers. Check her out- she's seraphic. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 274 |
The job ran to completion at a shade over 6 hours. Doesn't alter the facts though, I don't know what it was doing but I certainly got a load of free CPU time showing while it ran for it's specified 6 hours. Something is not right. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
bruce Send message Joined: 15 Sep 07 Posts: 10 Credit: 839,797 RAC: 0 |
Hi, I'm hoping someone might be able to help me out. Over the last few days, I've seen this on the results web page: 253893990 231705743 25 May 2009 7:20:31 UTC 25 May 2009 11:26:52 UTC Over Client error Compute error 3,835.50 12.70 --- 253872624 231686047 25 May 2009 5:14:25 UTC 25 May 2009 9:10:57 UTC Over Client error Compute error 100.81 0.33 --- I'm getting this for two different computers: 1)WinXP SP3 on an Intel T7500 w/2gb ram. 2)WinVista SP1 on an AMD Dual Core QL-62 w/2gb ram. What I'm seeing on the BOINC Manager Messages is: 18-May-2009 00:25:32 [rosetta@home] Restarting task gen2_seqrelax_200_oldfrag_cst_hb_t297__IGNORE_THE_REST_1FXWF_10_12356_4_0 using minirosetta version 167 18-May-2009 00:26:13 [rosetta@home] Task gen2_seqrelax_200_oldfrag_cst_hb_t297__IGNORE_THE_REST_1FXWF_10_12356_4_0 exited with zero status but no 'finished' file 18-May-2009 00:26:13 [rosetta@home] If this happens repeatedly you may need to reset the project. This is occuring repeatedly and I have reset the project through the BOINC manager. The same occurs. This was occuring on BOINC 6.6.20, so I upgraded to 6.6.28. I'm still getting the same results and messages even after letting it run for a few hours on different downloaded tasks. Any help would be appreciated. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Any help would be appreciated. Are you running with less than 100% BOINC CPU? What other projects are you running on these machines? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
I recently built a new PC and figured all the problems I was having with Rosetta Mini on my old machine would go away, however I'm still seeing most of the mini WUs crashing, at least all the crashed ones checked were mini WUs. Yet with the new machine the reason for the crashes is different. Could someone please look at my results and provide some feedback as to what the problem may be? The errors checked all showed a lot of "Can't acquire lockfile - exiting" messages. I'm running the same OS (Win XP Pro, SP3) and antivirus (Kaspersky v6.0) as on the old machine. Could the antivirus or some other application be causing these errors? BTW, as with the old machine the Rosetta Beta WUs run just fine. Are you aware that once the lockfile problem starts, it cascades to all the workunits (at least minirosetta workunits) that try to run in the same slot, until the next reboot? Try suspending network communications, then suspending all workunits, then rebooting, then undoing the suspends. It seems that the lockfile problem has something to do with a failed workunit that does not clean up the files in its slot, and one of the files left behind interferes with any future workunits that try to use that slot. A BOINC restart after a reboot cleans up any files left behind, though. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
Looks like 1.67 still has that cascading lockfile problem. I just encountered it myself. https://boinc.bakerlab.org/rosetta/result.php?resultid=253564667 https://boinc.bakerlab.org/rosetta/result.php?resultid=253687069 This is on a computer tuned to look for the lockfile problem - 95% CPU, 32-bit Vista SP1, BOINC 6.2.28, with enough CPU time devoted to other BOINC projects that minirosetta workunits are unlikely to finish without some time in the same CPU core being granted to some other BOINC project. Could minirosetta be modified to start out looking for a leftover lockfile from a previous workunit, and if one is found, abort immediately with an error message suggesting a reboot, instead of first wasting CPU time for a while and then ending with an error message that does not suggest how to fix the problem? At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work. |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work. Can they not be deleted if Boinc is closed, so that no processes are running in Task Manager, then deleting the boinc_lockfiles in the slots folder before rebooting? I thought that worked in the past, as described here? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work. Might be usable if I knew any way to shut down the BOINC program safely other than shutting down all of Windows Vista. That reference doesn't seem to include that detail. Shutting down the boincmgr program is easy, but just doing that isn't enough. |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work. Oh. I thought it was a case of closing the icon in the system tray, going to Task Manager and ending process of boincmgr.exe, then doing the same with boinc.exe as well. Then go to C:Program FilesBOINCslots (in Vista) and removing the 0-byte boinc_lockfile files in those folders that don't have running WUs (if you can work out which ones those are). |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr. |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr. Ok, but aren't there boinc related files on the Process tab of Windows Task Manager that you can end? That should release hold of the lockfiles. If not, I'm surprised. Would chkdsk be your only option? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr. I'm not familiar with the Windows Task Manager and when it's safe to use it on BOINC. However, I've just finished uninstalling BOINC 6.6.20 and installing BOINC 6.6.28 on one of my machines; the initial workunits download for 6.6.28 downloaded a few days worth of workunits from other BOINC projects, but none from Rosetta@home. Trying to get some gives these error messages instead: 5/26/2009 3:33:00 PM rosetta@home update requested by user 5/26/2009 3:33:01 PM rosetta@home Sending scheduler request: Requested by user. 5/26/2009 3:33:01 PM rosetta@home Requesting new tasks 5/26/2009 3:33:06 PM rosetta@home Scheduler request completed: got 0 new tasks 5/26/2009 3:33:06 PM rosetta@home Message from server: Server error: can't attach shared memory The server status says that the feeder program isn't running. Is that enough to disable getting new workunits for now? |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr. A few months ago this came up and I think it was ok to suspend all tasks, close all those files in the system tray and processes tab, delete lockfiles, reboot and unsuspend - and the lockfiles released. However, I've just finished uninstalling BOINC 6.6.20 and installing BOINC 6.6.28 on one of my machines; the initial workunits download for 6.6.28 downloaded a few days worth of workunits from other BOINC projects, but none from Rosetta@home. Trying to get some gives these error messages instead: You just can't get a break, can you. I don't know what the feeder is, but a change to srv4 a couple of months back caused errors like this and I can't up or download either, so you aren't on your own. What do they say? 'If it wasn't for bad luck you wouldn't have any luck at all...' In the meantime, did those lockfiles clear up when you upgraded? If so, good. If not, maybe you could try manually tidying them up again in the way described. You may as well get some use out of the downtime... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,264,359 RAC: 4,479 |
I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr. I rebooted during the upgrade process; that's probably what cleared up the lockfiles. Also, after I posted my last message, I finally read a thread which mentioned how to use boincmgr to shut down boinc, and one which said that it isn't unusual for new workunits to be unavailable when the feeder process isn't running. |
Message boards :
Number crunching :
Problems with Minirosetta Version 1.67
©2024 University of Washington
https://www.bakerlab.org