minirosetta 2.17

Message boards : Number crunching : minirosetta 2.17

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68640 - Posted: 16 Nov 2010, 15:51:18 UTC

cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine?
Rosetta Moderator: Mod.Sense
ID: 68640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68643 - Posted: 16 Nov 2010, 17:27:36 UTC

Cleaner, this topic was posted about earlier in the thread.

I would like to write some details on this topic, but when I downloaded the new BOINC version it says it has a 25% CPU threshold (in the startup messages), but it doesn't seem to be enforcing it. I've updated other local preferences, see the 25% in the global_prefs_override.xml file, updated to R@h, restarted BOINC, but it still doesn't seem to suspend when CPU usage gets high.

Does anyone have specifics on the combination of updating to another project, or account manager, and what causes people to see the CPU threshold being enforced? I'd like to make it happen on my machine, study it in more detail and verify alternatives for establishing the desired setting for the CPU threshold.

Rosetta Moderator: Mod.Sense
ID: 68643 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>france>pas-de-calais]symaski62

Send message
Joined: 19 Sep 05
Posts: 47
Credit: 33,871
RAC: 0
Message 68645 - Posted: 16 Nov 2010, 20:50:50 UTC

yes, :) i am french

1 CPU => BOINC 0% CPU & rosetta 100%

2 CPU => BOINC 25% & rosetta 50%

2 CPU => BOINC 0% & rosetta 100%

4 CPU => BOINC 25% & rosetta 25%, 50%, 75%.

4 CPU => BOINC 0% & rosetta 100%




ID: 68645 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68646 - Posted: 16 Nov 2010, 23:24:48 UTC

Yes, merci a` symaski62, I am aware of the setting you are showing. But BOINC is still running 100% of CPU when low priority permits it. Even if another task is using more then 25% of the CPU for several minutes. The 25% threshold, as shown in the start up messages and the display you are showing, is being ignored.
Rosetta Moderator: Mod.Sense
ID: 68646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 68647 - Posted: 16 Nov 2010, 23:55:12 UTC - in response to Message 68643.  
Last modified: 16 Nov 2010, 23:56:08 UTC

I would like to write some details on this topic, but
when I downloaded the new BOINC version it says it has a 25%
CPU threshold (in the startup messages), but it doesn't seem
to be enforcing it. I've updated other local preferences,
see the 25% in the global_prefs_override.xml file, updated
to R@h, restarted BOINC, but it still doesn't seem to
suspend when CPU usage gets high.


I assume in your activity menu you have "Run based on preferences"
selected? It is a simple option that I expect you have probably
checked already, but it is often the simple things that trip
people up.
ID: 68647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68649 - Posted: 17 Nov 2010, 0:19:25 UTC

<<---- smacks forehead. Been a while since I've tripped up on that one. Thanks.
Rosetta Moderator: Mod.Sense
ID: 68649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cleaner

Send message
Joined: 22 Aug 10
Posts: 6
Credit: 26,245
RAC: 0
Message 68650 - Posted: 17 Nov 2010, 9:46:34 UTC - in response to Message 68640.  

cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine?


I ran the memory test tool from Microsoft 3 or 4 weeks ago and it tested okay. My machine is not overclocked. I will reset to default prefs and see what happens.
ID: 68650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfpat

Send message
Joined: 1 May 10
Posts: 4
Credit: 2,415,305
RAC: 658
Message 68667 - Posted: 19 Nov 2010, 14:44:09 UTC

I've had so much trouble with minirosetta 2.17, I had to stop running it on two of my machines. The only results I get on them anyway is "Computation Error"

There's no problem with it on my Windows 7 computer. But with my Windows 2000 and my XP machines, it totally louses up Explorer. I have to restart using the reset button to get them to do anything. The only consistent symptoms are that all text disappears and clicking on icons has no response.
ID: 68667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68668 - Posted: 19 Nov 2010, 17:17:45 UTC

The responsiveness of your computer is often related to memory. The active programs are using the memory and when you sit down and start something else, you first have to bring the programs that control the desktop etc. in to memory again.

Your XP machine only has 1GB of memory for 2 processors. That is on the small side, and the tasks that have been running recently are taking memory more on the large side.

You can configure how much memory BOINC is allowed to use and this will help reserve some space for your other applications. I'd suggest perhaps just allowing BOINC to only use one CPU on that machine might be a good compromise. It will only need memory for one task rather then two, and you probably won't have to worry too much about setting any specific memory limitations.

Your Win2000 machine has 512MB for one CPU. Again on the small side for what Rosetta would like to have to run well. Your Win7 machine by comparison, where you say things are running well, has 4GB for 2 CPUs.

Having said all of that, now in looking at the task details I see they all seem to fail with failures accessing files. In some cases the file named in the v2.17 program itself. So it was running for an hour, and then disappeared? It sounds like you have something else going on. Perhaps a virus checker discovering new files and placing them under quarantine?
Rosetta Moderator: Mod.Sense
ID: 68668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68679 - Posted: 21 Nov 2010, 20:21:47 UTC
Last modified: 21 Nov 2010, 20:22:10 UTC

Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ??

I am seeing these on several machines, both OSX and Linux, AMD and Intel.

They are generally really short running - only a few minutes. The joblog says they are shutting down cleanly, but they all seem to get validate errors.

A few sample tasks would be:

380698404
380672414
380781443
380752525

380728668
380708503
380737556
ID: 68679 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 68682 - Posted: 21 Nov 2010, 23:06:01 UTC - in response to Message 68679.  
Last modified: 21 Nov 2010, 23:08:04 UTC

Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ??

I am seeing these on several machines, both OSX and Linux, AMD and Intel.


Each of the tasks you listed also returned errors for your wingmen, though 380672414
got a compute error rather than a validate error. It doesn't appear to be platform
specific as your wingmen were using a mixture of machines including Darwin,
Windows XP and Windows 7.

I have only had one rb_11_20 task go through so far, but it appears to have validated
okay.
ID: 68682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,813,645
RAC: 2,151
Message 68684 - Posted: 23 Nov 2010, 1:56:39 UTC

This task:

rb_11_22_20682_38744_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22593_1483_1

ended after 13:16.
ID: 68684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68704 - Posted: 24 Nov 2010, 18:47:00 UTC

Is anyone else seeing this?
<message>
Maximum memory exceeded
</message>


See details here
Rosetta Moderator: Mod.Sense
ID: 68704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile LigH
Avatar

Send message
Joined: 7 Sep 09
Posts: 25
Credit: 9,241,214
RAC: 0
Message 68777 - Posted: 7 Dec 2010, 10:29:51 UTC

At the moment, there are 3 tasks of 4 hung for me ("Processor time" much lower than "Elapsed time", 0% CPU, ~300 MB RAM):


Fun and success!

Jobs: holzon + 12angebote
Hobbies: doom9/Gleitz + PlaneShift
ID: 68777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile LigH
Avatar

Send message
Joined: 7 Sep 09
Posts: 25
Credit: 9,241,214
RAC: 0
Message 68788 - Posted: 8 Dec 2010, 7:47:37 UTC
Last modified: 8 Dec 2010, 8:40:19 UTC

A reboot in the meantime must have unlocked the tasks.

That means I cannot trust BOINC running unattended.
__

P.S.: Quitting and restarting the BOINC manager helped as well.

I wonder if BOINC should implement a detector for hung tasks and restart those up to # times when one is detected "active" but not progressing for at least # minutes.
Fun and success!

Jobs: holzon + 12angebote
Hobbies: doom9/Gleitz + PlaneShift
ID: 68788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 68790 - Posted: 10 Dec 2010, 1:27:35 UTC

I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours:

rhoA8Dec2010_1lb1_2a1i_ProteinInterfaceDesign_8Dec2010_22762_101

mem_prog_run05_centroid_round01_E_subrun_000003_SAVE_ALL_OUT_IGNORE_THE_REST_22743_66868

The delays on uploading seem to be holding up any requests for downloading more workunits from Rosetta@home, and somewhat for workunits from other projects as well.

Is your server for accepting uploads having problems?
ID: 68790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 68792 - Posted: 10 Dec 2010, 1:51:06 UTC - in response to Message 68790.  

I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours:

rhoA8Dec2010_1lb1_2a1i_ProteinInterfaceDesign_8Dec2010_22762_101

mem_prog_run05_centroid_round01_E_subrun_000003_SAVE_ALL_OUT_IGNORE_THE_REST_22743_66868

The delays on uploading seem to be holding up any requests for downloading more workunits from Rosetta@home, and somewhat for workunits from other projects as well.

Is your server for accepting uploads having problems?


R@H was down for about a whole day or so.
ID: 68792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68793 - Posted: 10 Dec 2010, 1:51:11 UTC

robertmiles, all the servers were down for about 36hours here, just recovering now. Pending uploads do not impair downloads, but both servers are currently very busy and you may be seeing the BOINC imposed delays before it tries again.
Rosetta Moderator: Mod.Sense
ID: 68793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68809 - Posted: 15 Dec 2010, 11:50:21 UTC

Anyone else seeing this type of error?

TaskID: 386452405

Name: SerineHydrolase_relax_oh37_010_22774_173_0

ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context
ID: 68809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,536,623
RAC: 17,489
Message 68827 - Posted: 17 Dec 2010, 21:02:09 UTC

Few members of my team on our forum reported that part of the taks is completed much earlier the target CPU time. In this case, seems to be no other errors there - a tasks reported as usual and validated by server. Just calculation time is much (several times) smaller than the target time. For example, in this task: https://boinc.bakerlab.org/rosetta/result.php?resultid=386683334
# cpu_run_time_pref: 21600
======================================================
DONE :: 2 starting structures 3104.28 cpu seconds
This process generated 2 decoys from 2 attempts

So it is normal? Ie there is some criterion by which the client finish the calculation so early (similar to how the watchdog force end of calc when the target time + 4 hours exceeded, only in reverse). Or it is some sort of bug?

I myself have never met with such. But probably because i use a small target time (2 hours), and all who reported these tasks using a large target time (above the default) - i.e. 6-12 hours.
ID: 68827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : minirosetta 2.17



©2024 University of Washington
https://www.bakerlab.org