Message boards : Number crunching : minirosetta 2.17
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine? Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Cleaner, this topic was posted about earlier in the thread. I would like to write some details on this topic, but when I downloaded the new BOINC version it says it has a 25% CPU threshold (in the startup messages), but it doesn't seem to be enforcing it. I've updated other local preferences, see the 25% in the global_prefs_override.xml file, updated to R@h, restarted BOINC, but it still doesn't seem to suspend when CPU usage gets high. Does anyone have specifics on the combination of updating to another project, or account manager, and what causes people to see the CPU threshold being enforced? I'd like to make it happen on my machine, study it in more detail and verify alternatives for establishing the desired setting for the CPU threshold. Rosetta Moderator: Mod.Sense |
[AF>france>pas-de-calais]symaski62 Send message Joined: 19 Sep 05 Posts: 47 Credit: 33,871 RAC: 0 |
yes, :) i am french 1 CPU => BOINC 0% CPU & rosetta 100% 2 CPU => BOINC 25% & rosetta 50% 2 CPU => BOINC 0% & rosetta 100% 4 CPU => BOINC 25% & rosetta 25%, 50%, 75%. 4 CPU => BOINC 0% & rosetta 100% |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, merci a` symaski62, I am aware of the setting you are showing. But BOINC is still running 100% of CPU when low priority permits it. Even if another task is using more then 25% of the CPU for several minutes. The 25% threshold, as shown in the start up messages and the display you are showing, is being ignored. Rosetta Moderator: Mod.Sense |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
I would like to write some details on this topic, but I assume in your activity menu you have "Run based on preferences" selected? It is a simple option that I expect you have probably checked already, but it is often the simple things that trip people up. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
<<---- smacks forehead. Been a while since I've tripped up on that one. Thanks. Rosetta Moderator: Mod.Sense |
cleaner Send message Joined: 22 Aug 10 Posts: 6 Credit: 26,245 RAC: 0 |
cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine? I ran the memory test tool from Microsoft 3 or 4 weeks ago and it tested okay. My machine is not overclocked. I will reset to default prefs and see what happens. |
wolfpat Send message Joined: 1 May 10 Posts: 4 Credit: 2,415,305 RAC: 658 |
I've had so much trouble with minirosetta 2.17, I had to stop running it on two of my machines. The only results I get on them anyway is "Computation Error" There's no problem with it on my Windows 7 computer. But with my Windows 2000 and my XP machines, it totally louses up Explorer. I have to restart using the reset button to get them to do anything. The only consistent symptoms are that all text disappears and clicking on icons has no response. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The responsiveness of your computer is often related to memory. The active programs are using the memory and when you sit down and start something else, you first have to bring the programs that control the desktop etc. in to memory again. Your XP machine only has 1GB of memory for 2 processors. That is on the small side, and the tasks that have been running recently are taking memory more on the large side. You can configure how much memory BOINC is allowed to use and this will help reserve some space for your other applications. I'd suggest perhaps just allowing BOINC to only use one CPU on that machine might be a good compromise. It will only need memory for one task rather then two, and you probably won't have to worry too much about setting any specific memory limitations. Your Win2000 machine has 512MB for one CPU. Again on the small side for what Rosetta would like to have to run well. Your Win7 machine by comparison, where you say things are running well, has 4GB for 2 CPUs. Having said all of that, now in looking at the task details I see they all seem to fail with failures accessing files. In some cases the file named in the v2.17 program itself. So it was running for an hour, and then disappeared? It sounds like you have something else going on. Perhaps a virus checker discovering new files and placing them under quarantine? Rosetta Moderator: Mod.Sense |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ?? I am seeing these on several machines, both OSX and Linux, AMD and Intel. They are generally really short running - only a few minutes. The joblog says they are shutting down cleanly, but they all seem to get validate errors. A few sample tasks would be: 380698404 380672414 380781443 380752525 380728668 380708503 380737556 |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ?? Each of the tasks you listed also returned errors for your wingmen, though 380672414 got a compute error rather than a validate error. It doesn't appear to be platform specific as your wingmen were using a mixture of machines including Darwin, Windows XP and Windows 7. I have only had one rb_11_20 task go through so far, but it appears to have validated okay. |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,813,645 RAC: 2,151 |
This task: rb_11_22_20682_38744_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22593_1483_1 ended after 13:16. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Is anyone else seeing this? <message> See details here Rosetta Moderator: Mod.Sense |
LigH Send message Joined: 7 Sep 09 Posts: 25 Credit: 9,241,214 RAC: 0 |
At the moment, there are 3 tasks of 4 hung for me ("Processor time" much lower than "Elapsed time", 0% CPU, ~300 MB RAM):
Fun and success! Jobs: holzon + 12angebote Hobbies: doom9/Gleitz + PlaneShift |
LigH Send message Joined: 7 Sep 09 Posts: 25 Credit: 9,241,214 RAC: 0 |
A reboot in the meantime must have unlocked the tasks. That means I cannot trust BOINC running unattended. __ P.S.: Quitting and restarting the BOINC manager helped as well. I wonder if BOINC should implement a detector for hung tasks and restart those up to # times when one is detected "active" but not progressing for at least # minutes. Fun and success! Jobs: holzon + 12angebote Hobbies: doom9/Gleitz + PlaneShift |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,227 |
I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours: rhoA8Dec2010_1lb1_2a1i_ProteinInterfaceDesign_8Dec2010_22762_101 mem_prog_run05_centroid_round01_E_subrun_000003_SAVE_ALL_OUT_IGNORE_THE_REST_22743_66868 The delays on uploading seem to be holding up any requests for downloading more workunits from Rosetta@home, and somewhat for workunits from other projects as well. Is your server for accepting uploads having problems? |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours: R@H was down for about a whole day or so. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
robertmiles, all the servers were down for about 36hours here, just recovering now. Pending uploads do not impair downloads, but both servers are currently very busy and you may be seeing the BOINC imposed delays before it tries again. Rosetta Moderator: Mod.Sense |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Anyone else seeing this type of error? TaskID: 386452405 Name: SerineHydrolase_relax_oh37_010_22774_173_0 ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,536,623 RAC: 17,489 |
Few members of my team on our forum reported that part of the taks is completed much earlier the target CPU time. In this case, seems to be no other errors there - a tasks reported as usual and validated by server. Just calculation time is much (several times) smaller than the target time. For example, in this task: https://boinc.bakerlab.org/rosetta/result.php?resultid=386683334 # cpu_run_time_pref: 21600 ====================================================== DONE :: 2 starting structures 3104.28 cpu seconds This process generated 2 decoys from 2 attempts So it is normal? Ie there is some criterion by which the client finish the calculation so early (similar to how the watchdog force end of calc when the target time + 4 hours exceeded, only in reverse). Or it is some sort of bug? I myself have never met with such. But probably because i use a small target time (2 hours), and all who reported these tasks using a large target time (above the default) - i.e. 6-12 hours. |
Message boards :
Number crunching :
minirosetta 2.17
©2024 University of Washington
https://www.bakerlab.org