Problems with Rosetta version 5.46

Message boards : Number crunching : Problems with Rosetta version 5.46

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 36637 - Posted: 13 Feb 2007, 2:34:32 UTC

Please report here for problems you have observed with Rosetta version 5.46.
ID: 36637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 36758 - Posted: 13 Feb 2007, 22:31:51 UTC

ID: 36758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 36781 - Posted: 14 Feb 2007, 14:18:18 UTC

Marty's WUs are from 5.45 and 5.46, all seem to end with -107. And MOST of the WUs on this Win/XP machine are now failing. One after 47 seconds, others after more then two hours.
Rosetta Moderator: Mod.Sense
ID: 36781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 36783 - Posted: 14 Feb 2007, 14:26:36 UTC - in response to Message 36781.  

And MOST of the WUs on this Win/XP machine are now failing.

Grrr, so they are. I've set that host to "no new work" on Rosetta for now until the cause is found.
ID: 36783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 36789 - Posted: 14 Feb 2007, 17:33:15 UTC - in response to Message 36758.  

Have you tried to reset the project to see if it helps? Those workunits themself seem to be fine and if this happens all the time on a single host, my guess is that some files become corrupted. Another possibility is hardware problem though this can be ruled out if it does not have problem of running other programs.
What could be causing these compute errors? It's only happening on one of my hosts in the last few weeks.

https://boinc.bakerlab.org/rosetta/result.php?resultid=62506015
https://boinc.bakerlab.org/rosetta/result.php?resultid=62470017
https://boinc.bakerlab.org/rosetta/result.php?resultid=62378522
https://boinc.bakerlab.org/rosetta/result.php?resultid=62351637
https://boinc.bakerlab.org/rosetta/result.php?resultid=61390501

That host has been fine running Rosetta for ages.


ID: 36789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
StephenYavorsky

Send message
Joined: 24 Mar 06
Posts: 9
Credit: 87,195
RAC: 0
Message 36800 - Posted: 14 Feb 2007, 22:05:36 UTC - in response to Message 36637.  

Please report here for problems you have observed with Rosetta version 5.46.

"Waiting for memory"
I have never seen this message previously, but two Rosetta units, the final one in the queue from 5.45 and the first from 5.46, have both just now stopped, with the message "waiting for memory." This has not happened before.
ID: 36800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 36802 - Posted: 14 Feb 2007, 22:45:11 UTC - in response to Message 36800.  

Please report here for problems you have observed with Rosetta version 5.46.

"Waiting for memory"
I have never seen this message previously, but two Rosetta units, the final one in the queue from 5.45 and the first from 5.46, have both just now stopped, with the message "waiting for memory." This has not happened before.

There are two new setting in your "General Preferences". They are "use at most X percent ram while active", and "use at most X percent of ram when not active". By default they are set to 50 and 90 respectively. Try increasing them.

tony
ID: 36802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
StephenYavorsky

Send message
Joined: 24 Mar 06
Posts: 9
Credit: 87,195
RAC: 0
Message 36808 - Posted: 15 Feb 2007, 2:53:28 UTC - in response to Message 36802.  

Please report here for problems you have observed with Rosetta version 5.46.

"Waiting for memory"
I have never seen this message previously, but two Rosetta units, the final one in the queue from 5.45 and the first from 5.46, have both just now stopped, with the message "waiting for memory." This has not happened before.

There are two new setting in your "General Preferences". They are "use at most X percent ram while active", and "use at most X percent of ram when not active". By default they are set to 50 and 90 respectively. Try increasing them.

tony


Thanks, Tony, I've found the settings and I'm sure it will help.
ID: 36808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile meshmar

Send message
Joined: 1 Apr 06
Posts: 26
Credit: 176,432
RAC: 0
Message 36852 - Posted: 15 Feb 2007, 20:11:59 UTC - in response to Message 36802.  

Please report here for problems you have observed with Rosetta version 5.46.

"Waiting for memory"
I have never seen this message previously, but two Rosetta units, the final one in the queue from 5.45 and the first from 5.46, have both just now stopped, with the message "waiting for memory." This has not happened before.

There are two new setting in your "General Preferences". They are "use at most X percent ram while active", and "use at most X percent of ram when not active". By default they are set to 50 and 90 respectively. Try increasing them.

tony

I was aware of the change in preferences, and had changed mine already. Only one of my 'crunchers' had a problem - and only with some of the Rosetta WUs. These WUs seem to grab a LOT more memory than others, and that leads to the problem with 'waiting for memory' ....
ID: 36852 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Viromancy

Send message
Joined: 23 Sep 06
Posts: 8
Credit: 125,713
RAC: 0
Message 36865 - Posted: 16 Feb 2007, 7:23:42 UTC

Still some watchdog terminations with version 5.46:
https://boinc.bakerlab.org/rosetta/result.php?resultid=62694055
https://boinc.bakerlab.org/rosetta/result.php?resultid=62738141.

Haven't seen this type of error before.

ID: 36865 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 36866 - Posted: 16 Feb 2007, 8:18:25 UTC - in response to Message 36789.  

Have you tried to reset the project to see if it helps? Those workunits themself seem to be fine and if this happens all the time on a single host, my guess is that some files become corrupted. Another possibility is hardware problem though this can be ruled out if it does not have problem of running other programs.

Have tried that now, and that host is still failing - and on almost every WU now.

The same host now also fails to run the new Human Proteome Folding WUs from WGC, and that's Rosetta too. But every other project, including the others from WGC, run fine, so does any other bit of software I run on it.
ID: 36866 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 36873 - Posted: 16 Feb 2007, 9:12:37 UTC
Last modified: 16 Feb 2007, 9:13:49 UTC

I got this error on one of the WU's that was recently completed:

<core_client_version>5.8.8</core_client_version>
<![CDATA[
<stderr_txt>
# random seed: 1489227
# cpu_run_time_pref: 28800
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -252.879 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .aac4z1.out


https://boinc.bakerlab.org/rosetta/result.php?resultid=62674491
ID: 36873 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 36884 - Posted: 16 Feb 2007, 15:58:35 UTC

Same error on the next WU to complete:
<core_client_version>5.8.8</core_client_version>
<![CDATA[
<stderr_txt>
# random seed: 1432700
# cpu_run_time_pref: 28800
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -192.27 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .aand73.out^
https://boinc.bakerlab.org/rosetta/result.php?resultid=62736709
ID: 36884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Viromancy

Send message
Joined: 23 Sep 06
Posts: 8
Credit: 125,713
RAC: 0
Message 36886 - Posted: 16 Feb 2007, 16:58:35 UTC - in response to Message 36866.  

...

Have tried that now, and that host is still failing - and on almost every WU now.

The same host now also fails to run the new Human Proteome Folding WUs from WGC, and that's Rosetta too. But every other project, including the others from WGC, run fine, so does any other bit of software I run on it.


Is your machine overclocked?

I had the same problem: Rosetta worked fine for months, then these access violation errors started creeping in, then they became more common, then almost overnight 4 of every 5 WUs were failing with the same type of error. No other project was affected (wasn't running Human Proteome Folding at the time)and no other piece of software ever showed any kind of instability.

I'd been running my processor and memory at the highest stable clock setting I could find; and when I stepped the overclock down by a tiny amount (about 1.5% from 3.46GHz to 3.40GHz) the result was that Rosetta suddenly became completely stable again. Hardly had any access violation errors since.

If you've overclocked that machine, even if everything else runs okay, it might be worth dropping the speed down a little bit and seeing what happens.

ID: 36886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EigenState

Send message
Joined: 16 Feb 07
Posts: 4
Credit: 1,667
RAC: 0
Message 36890 - Posted: 16 Feb 2007, 19:48:49 UTC

I have only been running BOINC for a week, and that only for Einstein@Home until yesterday, 15 February 2007, when I attempted to attach to Rosetta as well. From all I can tell, the attachment and download went smoothly enough. However, as soon as any Rosetta Work Unit began its calculations, I immediately received a Compute Error. More surprisingly perhaps, Rosetta was then detached from my BOINC Manager.

I tried to re-attach two more times with basically identical results. Inspection of my Results Log indicates that three WU’s were terminated as Compute Errors, three were terminated claiming the user had detached, and one remains In Progress despite Rosetta having been detached. What is common to all is that after each attachment, Rosetta was spontaneously detached in that I did not request the detach action.

Examples from my Results Log follow:

Compute Error:
https://boinc.bakerlab.org/rosetta/result.php?resultid=62900991

Client Detached:
https://boinc.bakerlab.org/rosetta/result.php?resultid=62885782

In Progress:
https://boinc.bakerlab.org/rosetta/result.php?resultid=62902733

If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully.
ID: 36890 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 36893 - Posted: 16 Feb 2007, 20:35:02 UTC - in response to Message 36890.  
Last modified: 16 Feb 2007, 20:35:57 UTC

If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully.


@EigenState

Do you use BAM?
ID: 36893 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EigenState

Send message
Joined: 16 Feb 07
Posts: 4
Credit: 1,667
RAC: 0
Message 36894 - Posted: 16 Feb 2007, 20:39:43 UTC

Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that.
ID: 36894 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 36895 - Posted: 16 Feb 2007, 20:43:26 UTC - in response to Message 36893.  

If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully.


@EigenState

Do you use BAM?



i.e. An account manager BAM being BoincStats Account Manager or GridRepublic a similar one.

If so you need to attach through the account manager itself and check it has updated. If not, some more computer info and a spin of to a new post would be good.

I would recommend at this point to uninstall boinc, download the now updated again 5.8.11 version of boinc and reinstall (or just install 5.8.11 over the top but I was just making sure everything was cleaned out)
Team mauisun.org
ID: 36895 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 36896 - Posted: 16 Feb 2007, 20:44:56 UTC - in response to Message 36894.  
Last modified: 16 Feb 2007, 20:46:51 UTC

Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that.


As above, you have to attach, using the host options in BAM. If you try to attach yourself and it is a project boinc support, when it contacts BAM it will kick the project off (unfortunatly now questions asked)

To help you out

http://www.boincstats.com/bam/host_list.php
Link to your host list


Since you have a Rosetta@home account you may have to find it first in BAM
http://www.boincstats.com/bam/project_sign_up.php

Team mauisun.org
ID: 36896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 36898 - Posted: 16 Feb 2007, 21:18:28 UTC - in response to Message 36886.  

Is your machine overclocked?

Nope, it was running at the standard speed. Just for the heck of it though, I've now underclocked it 6% to see how it goes.
ID: 36898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problems with Rosetta version 5.46



©2024 University of Washington
https://www.bakerlab.org