WU run times out of whack

Message boards : Number crunching : WU run times out of whack

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 39606 - Posted: 19 Apr 2007, 15:44:27 UTC - in response to Message 39601.  

Well, my problem is this:
I have understood that Rosetta currently saves checkpoints after finishing one model (decoy). I have had models running for 5 h 30 mins, and as I move around a bit I have to avoid such sessions since I cannot handle such models if they choose inopportune moments to appear. Currently (5.59) I receive models needing less time to completion, but even now I have seen models exceeding 2 h by a not insignificant amount. If no intermediate saving is performed I am in trouble, either wasting too much computing time or, worse, getting stuck with a model I cannot complete within a reasonable period. So I seem to be able to limit my problems to the length of the first model of a wu (but exceptions are noted), and most of the time 5.59 manages quite nicely.

I am aware that I do not lose 8 h of work when shutting down the computer, but I question my effectiveness even if losing 1 h of computing. I have also observed the quirks of the statistic, so I no longer pay too much attention to the completion reports. Maybe wus have generally become shorter, but I cannot use too much time experimenting in this matter. I imagine I can cope until a new version appears and I can find out how often you plan to save a wu in progress. And then it all depends on whether I find my computing to be of sufficient use or not.

-- R. A. Mostol
ID: 39606 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 39610 - Posted: 19 Apr 2007, 16:19:25 UTC

ramostol, you are correct that there are some longer running models, and also that some long running models do not (yet) save checkpoints in between models. That's why they are working on the checkpointing. At present (Rosetta v5.59), some types of work units can take checkpoints in mid-model, and others cannot.

My point was just that the amount of work lost or preserved, for a given type of work unit, is the same, regardless of the runtime preference.
Rosetta Moderator: Mod.Sense
ID: 39610 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 39614 - Posted: 19 Apr 2007, 17:10:14 UTC

The checkpointing within a model will be for the pose and jumping jobs. The ab relax jobs already have checkpointing within a model. We will be able to set the checkpointing interval and will probably start at 5minutes or so and see how it goes.
ID: 39614 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 39620 - Posted: 19 Apr 2007, 21:21:47 UTC

See section on checkpoint here. Call the BOINC API boinc_time_to_checkpoint() when a checkpoint is possible and it will tell you if you should.

I presume the returned value is based upon the user's General Preference (see <disk_interval> property in global_prefs.xml) for how frequently to write to disk.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 39620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 39624 - Posted: 19 Apr 2007, 21:55:25 UTC

Feet1st, I'll add the api call and see how it goes with the default interval (60 sec). It may be overkill so I might increase the minimum interval.
ID: 39624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 39626 - Posted: 19 Apr 2007, 23:06:14 UTC - in response to Message 39624.  

Feet1st, I'll add the api call and see how it goes with the default interval (60 sec). It may be overkill so I might increase the minimum interval.

Perfect! Yes, BOINC only allows me to say the MOST I'd like it to use my disk... not the AVERAGE. But if user had specified 300 seconds or I think I've got mine set to 900 seconds (15 min.) then you don't want to be writing any more then that.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 39626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 39627 - Posted: 19 Apr 2007, 23:21:05 UTC

the checkpoint files are small
ID: 39627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 39640 - Posted: 20 Apr 2007, 9:09:00 UTC

The informations in the recent messages in this thread are just the facts needed to make us calm down and let the developers do their best.

In spite of some remaining irregularities I feel that Rosetta 5.59 is the most stable release to date for the Mac platform, and secure and predictable checkpointing will make most of these irregulatities unimportant when ensuring that Rosetta may function in a satisfying way.

Thanks.

-- R. A. Mostol
ID: 39640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : WU run times out of whack



©2024 University of Washington
https://www.bakerlab.org