Problems with Rosetta version 5.67

Message boards : Number crunching : Problems with Rosetta version 5.67

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 41554 - Posted: 28 May 2007, 14:22:55 UTC

I am a bit baffled why the one Rosetta WU might have gotten stopped, and even more baffled why it would have started a new Rosetta with one partially completed. The one waiting has a deadline closer than the one running.


This is normal BOINC behavior when the memory limits you have configured are passed (See General Preferences). It suspends the task that crossed the memory limit, and then fires up the next, in hopes that it might be able to run in less memory. Thus keeping the CPU active. Then later, perhaps the computer is not in use and BOINC is allowed more memory and it can continue with the one that got suspended. If you have a list of tasks, BOINC might begin each one in turn and run until it reaches the memory limits.

So it is just a factor of those tasks needing significantly more memory then normal. It caused many of us to hit these memory limits, when we are not used to seeing them.
Rosetta Moderator: Mod.Sense
ID: 41554 · Rating: 9.9920072216264E-15 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 4
Message 41559 - Posted: 28 May 2007, 17:17:21 UTC
Last modified: 28 May 2007, 17:18:17 UTC

Why do 1gidA WU's always show very little movement in the graphs of accepted energy and RMSD as well as the model plotting in the bottom box?
See my 1gidA thread further below, it has links to some screen shots of 5.64 and I am seeing the same behavior now in 5.67.
ID: 41559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Hepburn

Send message
Joined: 18 Sep 05
Posts: 14
Credit: 14,810,959
RAC: 28
Message 41563 - Posted: 28 May 2007, 18:48:54 UTC - in response to Message 41554.  

I am a bit baffled why the one Rosetta WU might have gotten stopped, and even more baffled why it would have started a new Rosetta with one partially completed. The one waiting has a deadline closer than the one running.


This is normal BOINC behavior when the memory limits you have configured are passed (See General Preferences). It suspends the task that crossed the memory limit, and then fires up the next, in hopes that it might be able to run in less memory. Thus keeping the CPU active. Then later, perhaps the computer is not in use and BOINC is allowed more memory and it can continue with the one that got suspended. If you have a list of tasks, BOINC might begin each one in turn and run until it reaches the memory limits.

So it is just a factor of those tasks needing significantly more memory then normal. It caused many of us to hit these memory limits, when we are not used to seeing them.


That's probably what happened, although it initially ran into the problem in the middle of the night when the computer was otherwise idle. Later, after I had posted the message and several other WUs went by, it started up by itself and finished. I have 1 GB or RAM and BOINC can have 90% when idle, 50% when in use. I had to work on some PowerPoint slides the morning I reported this and may rebooted before I started... I often do that for the very purpose of freeing up RAM that misbehaving applications may have left tied up.
ID: 41563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 4
Message 41574 - Posted: 28 May 2007, 22:22:58 UTC - in response to Message 41559.  
Last modified: 28 May 2007, 22:23:31 UTC

additional note: now on 3rd 1gidA WU and see some different but the same issues as below. There is a continuous line drawing for accepted energy, but there is no line for RMSD. Also there is no plot of the combined numbers of RMSD and Accepted energy in the bottom box. Is this the new standard of RAH for this particular protein?

Why do 1gidA WU's always show very little movement in the graphs of accepted energy and RMSD as well as the model plotting in the bottom box?
See my 1gidA thread further below, it has links to some screen shots of 5.64 and I am seeing the same behavior now in 5.67.


ID: 41574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,833,020
RAC: 1,601
Message 41587 - Posted: 29 May 2007, 12:45:02 UTC
Last modified: 29 May 2007, 12:46:12 UTC

Also issues with these FOLD_AND_DOCK_SUBSYSTEM jobs. This result, and this one.

Messages...

29/05/2007 10:06:09|rosetta@home|Restarting task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 using rosetta version 567
29/05/2007 14:30:56|rosetta@home|Deferring communication for 1 min 0 sec
29/05/2007 14:30:56|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
29/05/2007 14:30:56|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 finished
29/05/2007 14:30:56|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 absent
29/05/2007 14:30:56|malariacontrol.net beta|Starting wu_27_112_45189_0_1250717515_1
29/05/2007 14:30:56|malariacontrol.net beta|Starting task wu_27_112_45189_0_1250717515_1 using malariacontrol version 550
29/05/2007 14:31:05|rosetta@home|Deferring communication for 1 min 0 sec
29/05/2007 14:31:05|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
29/05/2007 14:31:05|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 finished
29/05/2007 14:31:05|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 absent

Intel P-IV HT Windows XP BOINC 5.8.16 no graphics, leave in memory.

Also saw the low VM message.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 41587 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 4
Message 41588 - Posted: 29 May 2007, 13:01:08 UTC - in response to Message 41587.  

from Rhiju in another thread:

"Hi Everybody:
Sorry for checking in a little late on this thread. I'm a bit puzzled that the FOLD_AND_DOCK_SUBSYSTEM workunits are taking up so much memory, but I've canceled all those jobs, and won't send any more out until we reduce the memory requirement! Apologies! Thanks for posting so quickly about the problem. It wasn't apparent on ralph.

Also: if you have one of these workunit in your queue, please feel free to cancel it rather than risk a system slowdown due to virtual memory problems."


Also issues with these FOLD_AND_DOCK_SUBSYSTEM jobs. This result, and this one.

Messages...

29/05/2007 10:06:09|rosetta@home|Restarting task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 using rosetta version 567
29/05/2007 14:30:56|rosetta@home|Deferring communication for 1 min 0 sec
29/05/2007 14:30:56|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
29/05/2007 14:30:56|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 finished
29/05/2007 14:30:56|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_38597_0 absent
29/05/2007 14:30:56|malariacontrol.net beta|Starting wu_27_112_45189_0_1250717515_1
29/05/2007 14:30:56|malariacontrol.net beta|Starting task wu_27_112_45189_0_1250717515_1 using malariacontrol version 550
29/05/2007 14:31:05|rosetta@home|Deferring communication for 1 min 0 sec
29/05/2007 14:31:05|rosetta@home|Reason: Unrecoverable error for result gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
29/05/2007 14:31:05|rosetta@home|Computation for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 finished
29/05/2007 14:31:05|rosetta@home|Output file gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0_0 for task gp04__BOINC_SYMM_FOLD_AND_DOCK_SUBSYSTEM-gp04_-delC126__1761_26234_0 absent

Intel P-IV HT Windows XP BOINC 5.8.16 no graphics, leave in memory.

Also saw the low VM message.


ID: 41588 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 41626 - Posted: 30 May 2007, 2:19:39 UTC - in response to Message 41527.  

Hi all: sorry, the work queue momentarily went blank for a little this afternoon, should be full again! In case you're sick of seeing 1gid, we're going to be sending out some work related to our protein projects now...

Hi all. This version has been tested a lot on ralph, but please let us know if you see anything unusual. Thanks for all your posts on 5.64 before, too!

Everything appears to running smoothly now.


ID: 41626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin

Send message
Joined: 13 Apr 07
Posts: 42
Credit: 260,782
RAC: 0
Message 41700 - Posted: 1 Jun 2007, 3:15:41 UTC

Incorrect function after 29mins with this guy

https://boinc.bakerlab.org/rosetta/result.php?resultid=83322361
ID: 41700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile skildude

Send message
Joined: 13 Dec 05
Posts: 7
Credit: 1,295,582
RAC: 0
Message 41740 - Posted: 2 Jun 2007, 0:37:19 UTC

I had a problem on my Linux Mandriva Free 2k7 with the gp04 and now had a problem with this WU https://boinc.bakerlab.org/rosetta/result.php?resultid=82785184

It got to 67% completed and froze. I checked my processes and the Rosetta process was on but not using any CPU. I restarted BOINC and allowed the WU tos sit for several hours like this and finally aborted it.
Space is a vast empty space. Let us hope that it does not occupy the region between your ears.

Come visit Team Starfire at www.TSWB.org

ID: 41740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 4
Message 41771 - Posted: 2 Jun 2007, 20:10:59 UTC

this WU came up with compute error in BOINC and in the results it shows as client error with a incorrect function message. When I peaked at the graphics they were all over the place on the scale for energy and rmsd. plots looked ok.
BOINC has moved on to the next WU in line and downloaded a new one to take its place.
ID: 41771 · Rating: -2 · rate: Rate + / Rate - Report as offensive    Reply Quote
B-Roy

Send message
Joined: 26 Sep 05
Posts: 26
Credit: 46,121
RAC: 0
Message 41859 - Posted: 5 Jun 2007, 16:08:36 UTC

My workunits all work fine and do also validate, but I have a problem with the checkpointing. While the %complete bar is moving up, shutting down my pc has a negative impact. Before the shut-down I was at 50%, while after booting the pc up again the wu starts at only 40%. Does this mean that the wu saves its results less frequently than the progress bar would indicate, and if so why? Is the 5.68 version improving the situation?
ID: 41859 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Problems with Rosetta version 5.67



©2024 University of Washington
https://www.bakerlab.org