Message boards : Number crunching : Problems with Rosetta version 5.67
Previous · 1 · 2 · 3
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I am a bit baffled why the one Rosetta WU might have gotten stopped, and even more baffled why it would have started a new Rosetta with one partially completed. The one waiting has a deadline closer than the one running. This is normal BOINC behavior when the memory limits you have configured are passed (See General Preferences). It suspends the task that crossed the memory limit, and then fires up the next, in hopes that it might be able to run in less memory. Thus keeping the CPU active. Then later, perhaps the computer is not in use and BOINC is allowed more memory and it can continue with the one that got suspended. If you have a list of tasks, BOINC might begin each one in turn and run until it reaches the memory limits. So it is just a factor of those tasks needing significantly more memory then normal. It caused many of us to hit these memory limits, when we are not used to seeing them. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Why do 1gidA WU's always show very little movement in the graphs of accepted energy and RMSD as well as the model plotting in the bottom box? See my 1gidA thread further below, it has links to some screen shots of 5.64 and I am seeing the same behavior now in 5.67. |
Bill Hepburn Send message Joined: 18 Sep 05 Posts: 14 Credit: 14,953,680 RAC: 1,424 |
I am a bit baffled why the one Rosetta WU might have gotten stopped, and even more baffled why it would have started a new Rosetta with one partially completed. The one waiting has a deadline closer than the one running. That's probably what happened, although it initially ran into the problem in the middle of the night when the computer was otherwise idle. Later, after I had posted the message and several other WUs went by, it started up by itself and finished. I have 1 GB or RAM and BOINC can have 90% when idle, 50% when in use. I had to work on some PowerPoint slides the morning I reported this and may rebooted before I started... I often do that for the very purpose of freeing up RAM that misbehaving applications may have left tied up. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
additional note: now on 3rd 1gidA WU and see some different but the same issues as below. There is a continuous line drawing for accepted energy, but there is no line for RMSD. Also there is no plot of the combined numbers of RMSD and Accepted energy in the bottom box. Is this the new standard of RAH for this particular protein? Why do 1gidA WU's always show very little movement in the graphs of accepted energy and RMSD as well as the model plotting in the bottom box? |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 4 |
Also issues with these FOLD_AND_DOCK_SUBSYSTEM jobs. This result, and this one. Messages... 29/05/2007 10:06:09|rosetta@home|Restarting task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 using rosetta version 567 29/05/2007 14:30:56|rosetta@home|Deferring communication for 1 min 0 sec 29/05/2007 14:30:56|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3)) 29/05/2007 14:30:56|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 finished 29/05/2007 14:30:56|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 absent 29/05/2007 14:30:56|malariacontrol.net beta|Starting wu_27_112_45189_0_1250717515_1 29/05/2007 14:30:56|malariacontrol.net beta|Starting task wu_27_112_45189_0_1250717515_1 using malariacontrol version 550 29/05/2007 14:31:05|rosetta@home|Deferring communication for 1 min 0 sec 29/05/2007 14:31:05|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3)) 29/05/2007 14:31:05|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 finished 29/05/2007 14:31:05|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 absent Intel P-IV HT Windows XP BOINC 5.8.16 no graphics, leave in memory. Also saw the low VM message. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
from Rhiju in another thread: "Hi Everybody: Sorry for checking in a little late on this thread. I'm a bit puzzled that the FOLD_AND_DOCK_SUBSYSTEM workunits are taking up so much memory, but I've canceled all those jobs, and won't send any more out until we reduce the memory requirement! Apologies! Thanks for posting so quickly about the problem. It wasn't apparent on ralph. Also: if you have one of these workunit in your queue, please feel free to cancel it rather than risk a system slowdown due to virtual memory problems." Also issues with these FOLD_AND_DOCK_SUBSYSTEM jobs. This result, and this one. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi all: sorry, the work queue momentarily went blank for a little this afternoon, should be full again! In case you're sick of seeing 1gid, we're going to be sending out some work related to our protein projects now... Hi all. This version has been tested a lot on ralph, but please let us know if you see anything unusual. Thanks for all your posts on 5.64 before, too! |
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Incorrect function after 29mins with this guy https://boinc.bakerlab.org/rosetta/result.php?resultid=83322361 |
skildude Send message Joined: 13 Dec 05 Posts: 7 Credit: 1,295,582 RAC: 0 |
I had a problem on my Linux Mandriva Free 2k7 with the gp04 and now had a problem with this WU https://boinc.bakerlab.org/rosetta/result.php?resultid=82785184 It got to 67% completed and froze. I checked my processes and the Rosetta process was on but not using any CPU. I restarted BOINC and allowed the WU tos sit for several hours like this and finally aborted it. Space is a vast empty space. Let us hope that it does not occupy the region between your ears. Come visit Team Starfire at www.TSWB.org |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
this WU came up with compute error in BOINC and in the results it shows as client error with a incorrect function message. When I peaked at the graphics they were all over the place on the scale for energy and rmsd. plots looked ok. BOINC has moved on to the next WU in line and downloaded a new one to take its place. |
B-Roy Send message Joined: 26 Sep 05 Posts: 26 Credit: 46,951 RAC: 0 |
My workunits all work fine and do also validate, but I have a problem with the checkpointing. While the %complete bar is moving up, shutting down my pc has a negative impact. Before the shut-down I was at 50%, while after booting the pc up again the wu starts at only 40%. Does this mean that the wu saves its results less frequently than the progress bar would indicate, and if so why? Is the 5.68 version improving the situation? |
Message boards :
Number crunching :
Problems with Rosetta version 5.67
©2024 University of Washington
https://www.bakerlab.org