Problems with Minirosetta Version 1.67

Message boards : Number crunching : Problems with Minirosetta Version 1.67

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 61317 - Posted: 21 May 2009, 23:10:59 UTC

I've experienced 10 compute errors out of the last 22 tasks -- not a good track record. I think this is the most errors I've ever had with a particular version of Rosetta or perhaps it is the tasks themselves.
ID: 61317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CharlyD

Send message
Joined: 1 Dec 06
Posts: 5
Credit: 135,227
RAC: 0
Message 61318 - Posted: 22 May 2009, 0:09:04 UTC

I also had two errors in the three last WUs...
ID: 61318 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christopher Woods

Send message
Joined: 13 May 07
Posts: 2
Credit: 43,235
RAC: 0
Message 61329 - Posted: 22 May 2009, 11:37:26 UTC

1.67 is also still triggering Kaspersky Internet Security's detection, (and being automatically moved into the Untrusted apps category, it cannot be loaded by BOINC and just exits with 0 status constantly).

You have to manually reclassify the minirosetta executable every day but it continues to occur. This has been noted as happening for almost a year; how come the Rosetta exe triggers KIS but none of the other BOINC projects' crunchers do? (I run ClimatePrediction, SETI, Predictor @ Home and a bunch of others, and they all play nice)

It's frustrating to wake up and see KIS just constantly notifying me of blocked launch attempts :( waste of good CPU cycles...
ID: 61329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 61332 - Posted: 22 May 2009, 13:50:57 UTC

I am also getting no screen graphics on all my Mini 1.67 WU's that begin with pp_
ID: 61332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HW&JC

Send message
Joined: 2 May 08
Posts: 20
Credit: 7,613,222
RAC: 2,519
Message 61343 - Posted: 23 May 2009, 15:13:05 UTC - in response to Message 61329.  

1.67 is also still triggering Kaspersky Internet Security's detection, (and being automatically moved into the Untrusted apps category, it cannot be loaded by BOINC and just exits with 0 status constantly).

My sympathies. Same for Norton Internet Security 2009 as posted by me at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4876&nowrap=true#61205

As I said there, is there no way of getting both Symantec and Kaspersky on board with Mini Rosetta to whitelist it?

How many people are running away from Rosetta because all the WUs abort? Maybe we'll never know :(
ID: 61343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,663,494
RAC: 723
Message 61353 - Posted: 24 May 2009, 21:09:23 UTC

Speedfan shows I am using only about 75% of one of my systems, (quad core), when I suspend the miniRosetta 1.67 wu, it goes back to 100%, (crunching Einstein and CPDN), and yet, the Rosetta task appears to be running? I'll look at this some more in the morning.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 61353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cesium_133*
Avatar

Send message
Joined: 1 Dec 08
Posts: 28
Credit: 225,332
RAC: 0
Message 61355 - Posted: 25 May 2009, 5:17:51 UTC - in response to Message 61353.  
Last modified: 25 May 2009, 5:18:11 UTC

I aborted these two projects:

threading_lb_test1_hb_t303__IGNORE_THE_REST_11824_2251_0; state 5
threading_lb_test1_hb_t327__IGNORE_THE_REST_11836_2238_1; state 5

I think they were both on 1.67... but in any case, they were at about 10% and 37%, respectively, and apparently stopped computing. I updated, suspended and restarted, and basically did everything but a reach-around. They just were not computing, so I aborted and started the next 2 Rosetta tasks. No problem. I don't know, just relaying the info to the community.

I haven't had many problems like this. The kicker, though, was when I tried looking at graphics on 1.67 Mini versus the advanced view, which is what I keep BOINC on most of the time, and nothing happened. It locked up and then closed. Nothing to be seen... Cesium...
The lovely lady you see isn't I, but Hayley Westenra, a classical crossover singer from Christchurch, NZ. There is no known voice as hers. Check her out- she's seraphic.

ID: 61355 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,663,494
RAC: 723
Message 61357 - Posted: 25 May 2009, 9:15:18 UTC

The job ran to completion at a shade over 6 hours. Doesn't alter the facts though, I don't know what it was doing but I certainly got a load of free CPU time showing while it ran for it's specified 6 hours. Something is not right.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 61357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bruce

Send message
Joined: 15 Sep 07
Posts: 10
Credit: 839,797
RAC: 0
Message 61359 - Posted: 25 May 2009, 12:25:45 UTC

Hi,

I'm hoping someone might be able to help me out. Over the last few days, I've seen this on the results web page:
253893990 231705743 25 May 2009 7:20:31 UTC 25 May 2009 11:26:52 UTC Over Client error Compute error 3,835.50 12.70 ---
253872624 231686047 25 May 2009 5:14:25 UTC 25 May 2009 9:10:57 UTC Over Client error Compute error 100.81 0.33 ---

I'm getting this for two different computers:
1)WinXP SP3 on an Intel T7500 w/2gb ram.
2)WinVista SP1 on an AMD Dual Core QL-62 w/2gb ram.

What I'm seeing on the BOINC Manager Messages is:
18-May-2009 00:25:32 [rosetta@home] Restarting task gen2_seqrelax_200_oldfrag_cst_hb_t297__IGNORE_THE_REST_1FXWF_10_12356_4_0 using minirosetta version 167
18-May-2009 00:26:13 [rosetta@home] Task gen2_seqrelax_200_oldfrag_cst_hb_t297__IGNORE_THE_REST_1FXWF_10_12356_4_0 exited with zero status but no 'finished' file
18-May-2009 00:26:13 [rosetta@home] If this happens repeatedly you may need to reset the project.


This is occuring repeatedly and I have reset the project through the BOINC manager. The same occurs.
This was occuring on BOINC 6.6.20, so I upgraded to 6.6.28. I'm still getting the same results and messages even after letting it run for a few hours on different downloaded tasks.

Any help would be appreciated.
ID: 61359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 61362 - Posted: 25 May 2009, 15:42:22 UTC - in response to Message 61359.  

Any help would be appreciated.

Are you running with less than 100% BOINC CPU?

What other projects are you running on these machines?
ID: 61362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61367 - Posted: 25 May 2009, 23:06:13 UTC - in response to Message 61197.  

I recently built a new PC and figured all the problems I was having with Rosetta Mini on my old machine would go away, however I'm still seeing most of the mini WUs crashing, at least all the crashed ones checked were mini WUs. Yet with the new machine the reason for the crashes is different. Could someone please look at my results and provide some feedback as to what the problem may be? The errors checked all showed a lot of "Can't acquire lockfile - exiting" messages. I'm running the same OS (Win XP Pro, SP3) and antivirus (Kaspersky v6.0) as on the old machine. Could the antivirus or some other application be causing these errors? BTW, as with the old machine the Rosetta Beta WUs run just fine.

If this problem persists, it there a way to block mini WUs and only allow processing of beta WUs? This would allow my machine to do a lot more useful work and not waste time on WUs that will continue to fail.

My tasks: https://boinc.bakerlab.org/rosetta/results.php?userid=254884

Thank you!


Are you aware that once the lockfile problem starts, it cascades to all the workunits (at least minirosetta workunits) that try to run in the same slot, until the next reboot? Try suspending network communications, then suspending all workunits, then rebooting, then undoing the suspends.

It seems that the lockfile problem has something to do with a failed workunit that does not clean up the files in its slot, and one of the files left behind interferes with any future workunits that try to use that slot. A BOINC restart after a reboot cleans up any files left behind, though.
ID: 61367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61368 - Posted: 25 May 2009, 23:45:44 UTC

Looks like 1.67 still has that cascading lockfile problem. I just encountered it myself.

https://boinc.bakerlab.org/rosetta/result.php?resultid=253564667

https://boinc.bakerlab.org/rosetta/result.php?resultid=253687069

This is on a computer tuned to look for the lockfile problem - 95% CPU, 32-bit Vista SP1, BOINC 6.2.28, with enough CPU time devoted to other BOINC projects that minirosetta workunits are unlikely to finish without some time in the same CPU core being granted to some other BOINC project.

Could minirosetta be modified to start out looking for a leftover lockfile from a previous workunit, and if one is found, abort immediately with an error message suggesting a reboot, instead of first wasting CPU time for a while and then ending with an error message that does not suggest how to fix the problem?

At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work.
ID: 61368 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61370 - Posted: 26 May 2009, 0:57:42 UTC - in response to Message 61368.  

At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work.

Can they not be deleted if Boinc is closed, so that no processes are running in Task Manager, then deleting the boinc_lockfiles in the slots folder before rebooting?

I thought that worked in the past, as described here?

ID: 61370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61376 - Posted: 26 May 2009, 8:43:32 UTC - in response to Message 61370.  
Last modified: 26 May 2009, 8:52:44 UTC

At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work.

Can they not be deleted if Boinc is closed, so that no processes are running in Task Manager, then deleting the boinc_lockfiles in the slots folder before rebooting?

I thought that worked in the past, as described here?



Might be usable if I knew any way to shut down the BOINC program safely other than shutting down all of Windows Vista. That reference doesn't seem to include that detail.

Shutting down the boincmgr program is easy, but just doing that isn't enough.
ID: 61376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61379 - Posted: 26 May 2009, 11:00:45 UTC - in response to Message 61376.  

At least with this version of BOINC under Vista SP1, manual efforts to remove the lockfile without a reboot do not work.

Can they not be deleted if Boinc is closed, so that no processes are running in Task Manager, then deleting the boinc_lockfiles in the slots folder before rebooting?

I thought that worked in the past, as described here?

Might be usable if I knew any way to shut down the BOINC program safely other than shutting down all of Windows Vista. That reference doesn't seem to include that detail.

Shutting down the boincmgr program is easy, but just doing that isn't enough.

Oh. I thought it was a case of closing the icon in the system tray, going to Task Manager and ending process of boincmgr.exe, then doing the same with boinc.exe as well.

Then go to C:Program FilesBOINCslots (in Vista) and removing the 0-byte boinc_lockfile files in those folders that don't have running WUs (if you can work out which ones those are).
ID: 61379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61383 - Posted: 26 May 2009, 15:55:56 UTC - in response to Message 61379.  

I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr.
ID: 61383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61385 - Posted: 26 May 2009, 17:48:57 UTC - in response to Message 61383.  

I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr.

Ok, but aren't there boinc related files on the Process tab of Windows Task Manager that you can end? That should release hold of the lockfiles. If not, I'm surprised. Would chkdsk be your only option?
ID: 61385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61390 - Posted: 26 May 2009, 20:51:42 UTC - in response to Message 61385.  

I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr.

Ok, but aren't there boinc related files on the Process tab of Windows Task Manager that you can end? That should release hold of the lockfiles. If not, I'm surprised. Would chkdsk be your only option?


I'm not familiar with the Windows Task Manager and when it's safe to use it on BOINC. However, I've just finished uninstalling BOINC 6.6.20 and installing BOINC 6.6.28 on one of my machines; the initial workunits download for 6.6.28 downloaded a few days worth of workunits from other BOINC projects, but none from Rosetta@home. Trying to get some gives these error messages instead:

5/26/2009 3:33:00 PM rosetta@home update requested by user
5/26/2009 3:33:01 PM rosetta@home Sending scheduler request: Requested by user.
5/26/2009 3:33:01 PM rosetta@home Requesting new tasks
5/26/2009 3:33:06 PM rosetta@home Scheduler request completed: got 0 new tasks
5/26/2009 3:33:06 PM rosetta@home Message from server: Server error: can't attach shared memory

The server status says that the feeder program isn't running. Is that enough to disable getting new workunits for now?
ID: 61390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61396 - Posted: 26 May 2009, 22:46:35 UTC - in response to Message 61390.  

I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr.

Ok, but aren't there boinc related files on the Process tab of Windows Task Manager that you can end? That should release hold of the lockfiles. If not, I'm surprised. Would chkdsk be your only option?

I'm not familiar with the Windows Task Manager and when it's safe to use it on BOINC.

A few months ago this came up and I think it was ok to suspend all tasks, close all those files in the system tray and processes tab, delete lockfiles, reboot and unsuspend - and the lockfiles released.

However, I've just finished uninstalling BOINC 6.6.20 and installing BOINC 6.6.28 on one of my machines; the initial workunits download for 6.6.28 downloaded a few days worth of workunits from other BOINC projects, but none from Rosetta@home. Trying to get some gives these error messages instead:

5/26/2009 3:33:00 PM rosetta@home update requested by user
5/26/2009 3:33:01 PM rosetta@home Sending scheduler request: Requested by user.
5/26/2009 3:33:01 PM rosetta@home Requesting new tasks
5/26/2009 3:33:06 PM rosetta@home Scheduler request completed: got 0 new tasks
5/26/2009 3:33:06 PM rosetta@home Message from server: Server error: can't attach shared memory

The server status says that the feeder program isn't running. Is that enough to disable getting new workunits for now?

You just can't get a break, can you. I don't know what the feeder is, but a change to srv4 a couple of months back caused errors like this and I can't up or download either, so you aren't on your own. What do they say? 'If it wasn't for bad luck you wouldn't have any luck at all...'

In the meantime, did those lockfiles clear up when you upgraded? If so, good. If not, maybe you could try manually tidying them up again in the way described. You may as well get some use out of the downtime...
ID: 61396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,861,012
RAC: 2,322
Message 61397 - Posted: 26 May 2009, 23:46:27 UTC - in response to Message 61396.  

I used the installation method that makes BOINC run for all users. As a result, I don't have a symbol for it in my system tray; only one for boincmgr.

Ok, but aren't there boinc related files on the Process tab of Windows Task Manager that you can end? That should release hold of the lockfiles. If not, I'm surprised. Would chkdsk be your only option?

I'm not familiar with the Windows Task Manager and when it's safe to use it on BOINC.

A few months ago this came up and I think it was ok to suspend all tasks, close all those files in the system tray and processes tab, delete lockfiles, reboot and unsuspend - and the lockfiles released.

However, I've just finished uninstalling BOINC 6.6.20 and installing BOINC 6.6.28 on one of my machines; the initial workunits download for 6.6.28 downloaded a few days worth of workunits from other BOINC projects, but none from Rosetta@home. Trying to get some gives these error messages instead:

5/26/2009 3:33:00 PM rosetta@home update requested by user
5/26/2009 3:33:01 PM rosetta@home Sending scheduler request: Requested by user.
5/26/2009 3:33:01 PM rosetta@home Requesting new tasks
5/26/2009 3:33:06 PM rosetta@home Scheduler request completed: got 0 new tasks
5/26/2009 3:33:06 PM rosetta@home Message from server: Server error: can't attach shared memory

The server status says that the feeder program isn't running. Is that enough to disable getting new workunits for now?

You just can't get a break, can you. I don't know what the feeder is, but a change to srv4 a couple of months back caused errors like this and I can't up or download either, so you aren't on your own. What do they say? 'If it wasn't for bad luck you wouldn't have any luck at all...'

In the meantime, did those lockfiles clear up when you upgraded? If so, good. If not, maybe you could try manually tidying them up again in the way described. You may as well get some use out of the downtime...


I rebooted during the upgrade process; that's probably what cleared up the lockfiles.

Also, after I posted my last message, I finally read a thread which mentioned how to use boincmgr to shut down boinc, and one which said that it isn't unusual for new workunits to be unavailable when the feeder process isn't running.
ID: 61397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Problems with Minirosetta Version 1.67



©2024 University of Washington
https://www.bakerlab.org