Remembering last position after exit

Message boards : Number crunching : Remembering last position after exit

To post messages, you must log in.

AuthorMessage
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 15155 - Posted: 1 May 2006, 12:42:54 UTC

Hi,

I am sure this must have been raised elsewhere, but is there any way for me to force Rosetta to remember its last position before I exit the BOINC client for the day?

Last night I was working on AB_CASP6_t248_465_6867_0, which had reached 18.53% complete and was part way through its 4th model.

This morning - after starting BOINC again - I found it had dropped to 18.50% and a check of the graphics showed it had also dropped to step 0 of model 4. Whilst in this particular case it had only lost 15 minutes of processing time, I expect this loss is more significant when the number of users is factored in.

I hope someone has the answer or can point me in the direction of an existing thread. Thanks.
ID: 15155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 15156 - Posted: 1 May 2006, 12:54:59 UTC

There's no way you can force it to. "checkpoints" are built in to the system, and when you stopped it, it fell back to the last checkpoint when it started. Do you have the application "kept in memory"?
ID: 15156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 15157 - Posted: 1 May 2006, 12:55:54 UTC - in response to Message 15155.  

Hi,

I am sure this must have been raised elsewhere, but is there any way for me to force Rosetta to remember its last position before I exit the BOINC client for the day?

Last night I was working on AB_CASP6_t248_465_6867_0, which had reached 18.53% complete and was part way through its 4th model.

This morning - after starting BOINC again - I found it had dropped to 18.50% and a check of the graphics showed it had also dropped to step 0 of model 4. Whilst in this particular case it had only lost 15 minutes of processing time, I expect this loss is more significant when the number of users is factored in.

I hope someone has the answer or can point me in the direction of an existing thread. Thanks.


Hi Murasaki

Welkome to Rosetta.

This thread hopfully gives you the info you seek.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1449

Anders n

ID: 15157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 15159 - Posted: 1 May 2006, 13:18:57 UTC

Thanks for the information.

I was thinking it was a problem with the checkpoints because the FAQs say "checkpoints occur each time the percentage advances" and the WU had lost 0.03%. However, if this is how checkpoints work then that is fine.

I have now set my preferences to stay in memory and switch applications every 120 minutes as suggested in the FAQs, so hopefully the effect of this problem will reduce.

Thanks again.
ID: 15159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15170 - Posted: 1 May 2006, 16:02:02 UTC - in response to Message 15159.  

"checkpoints occur each time the percentage advances"

I don't think that is strictly accurate. Especially now that they fairly recently made the percentage advance more frequently. Could you post a reference to where you saw that? Was that in the FAQs? It should probably be reworded.

But yes, what you saw is normal. The project just released new code which does the "checkpoints" much more frequently. As a result you "only" lost a tiny fraction of a model, rather than an hour (or more) of work. And so this new software will improve the efficiency of most all the clients in the user base, YEAH! We should see credit issued per day take a nice upward trend this week.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 15176 - Posted: 1 May 2006, 16:20:01 UTC - in response to Message 15170.  

"checkpoints occur each time the percentage advances"

I don't think that is strictly accurate. Especially now that they fairly recently made the percentage advance more frequently. Could you post a reference to where you saw that?


I spotted it in the answer to:
Why should I set BOINC to keep application in Memory?
ID: 15176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15177 - Posted: 1 May 2006, 16:26:07 UTC

Thanks for the reference, it says (in part):
However, If the application is removed from memory during an application swap, it will loose the work performed since the last checkpoint. In the case of Rosetta the checkpoints occur each time the percentage advances. If you do not keep the application in memory, and you set the swap interval to less than the time it takes your machine to reach a checkpoint, the work units can appear to be 'hung".


I suggest a moderator remove the sentence I highligted in bold. It would be nice if we could get a more clear description of when the new checkpoints are now possible. Are they evenly disbursed across a WU? Or each "Stage"? Or based on how the energy estimates progress? But I think we just proved that the sentence as it stands now is not accurate.

Thanks for posting back Murasaki. The project is already improved with you here :)
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15230 - Posted: 2 May 2006, 0:23:14 UTC - in response to Message 15177.  
Last modified: 2 May 2006, 2:06:30 UTC

Thanks for the reference, it says (in part):
However, If the application is removed from memory during an application swap, it will loose the work performed since the last checkpoint. In the case of Rosetta the checkpoints occur each time the percentage advances. If you do not keep the application in memory, and you set the swap interval to less than the time it takes your machine to reach a checkpoint, the work units can appear to be 'hung".


I suggest a moderator remove the sentence I highligted in bold. It would be nice if we could get a more clear description of when the new checkpoints are now possible. Are they evenly disbursed across a WU? Or each "Stage"? Or based on how the energy estimates progress? But I think we just proved that the sentence as it stands now is not accurate.

Thanks for posting back Murasaki. The project is already improved with you here :)

The checkpoints occur "about" every 20 min. When a checkpoint occurs the percent will advance. If you stop the Work Unit in a way that removes it from memory, when it restarts it will fall back to the last checkpoint.

The time of 20 min. is an estimate based on a project benchmark system. Faster systems do it more often than slower ones.

This is precisely the behavior you have described. The Work Unit had Check pointed, it continued to process for some period of time short of the next check point, and you removed it from memory. When you restarted it, it began processing at the last checkpoint, and the percent complete fell back accordingly.

EDIT: Ok I see the problem. There are now smaller increments to the percent increase so the percent can increase by .001. The checkpoint do not occur at that degree of fineness. The small increments were add as a diagnostic tool to identify the hang point on the Work Units. I will have to think about how this should read, because when the work unit checkpoints the percent does increase. But it is still possible for the percent to fall back if the work is interrupted.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15230 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15235 - Posted: 2 May 2006, 2:53:40 UTC

Well, if I understand it correctly, they took me up on my suggestion. Leave the checkpoints in ALL WUs, (not just the long ones) and if you hit one of these points where you are ABLE to checkpoint, and it's been more than 20min since your last checkpoint, then do so.

That was a balance between wasting resources writing all the checkpoints, and losing work due to not having enough. And it meant that the shorter proteins might go through 60% of a model before a checkpoint, and a longer protein might checkpoint after say 10% of a model. But I never heard how frequently they are ABLE to take a checkpoint.

Here's Bin's postabout it. This basically means you will checkpoint not MORE than every 20min (unless you hit the end of a model I suppose)... but it doesn't say how long it might be before you reach "a stage where checkpointing is possible".

Bottom line, how many checkpointable stages are there in a model? Or does it depend upon how large the protein is?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15240 - Posted: 2 May 2006, 3:49:31 UTC - in response to Message 15235.  
Last modified: 2 May 2006, 3:56:47 UTC

Well, if I understand it correctly, they took me up on my suggestion. Leave the checkpoints in ALL WUs, (not just the long ones) and if you hit one of these points where you are ABLE to checkpoint, and it's been more than 20min since your last checkpoint, then do so.

That was a balance between wasting resources writing all the checkpoints, and losing work due to not having enough. And it meant that the shorter proteins might go through 60% of a model before a checkpoint, and a longer protein might checkpoint after say 10% of a model. But I never heard how frequently they are ABLE to take a checkpoint.

Here's Bin's postabout it. This basically means you will checkpoint not MORE than every 20min (unless you hit the end of a model I suppose)... but it doesn't say how long it might be before you reach "a stage where checkpointing is possible".

Bottom line, how many checkpointable stages are there in a model? Or does it depend upon how large the protein is?


It has a lot to do with the protein size. There is no set number of check points for a model. It is also not a set hard time. My understanding is that it is based on a combination of factors, but it will generally work out to around 20 min. It varies significantly with machine speed.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Remembering last position after exit



©2024 University of Washington
https://www.bakerlab.org