Any objections to reducing the maximum run time to 12-16 hours?

Message boards : Number crunching : Any objections to reducing the maximum run time to 12-16 hours?

To post messages, you must log in.

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12483 - Posted: 22 Mar 2006, 2:22:03 UTC

Rom is hot on the trail of the remaining bugs, and I'm optimistic that he will make rapid progress. In the meantime, I see from the stuck at 1% thread that some people are having processes stuck for a very long time which is very annoying. We can solve this problem temporarily by reducing the maximum run time. We set this quite high at the time we introduced the user adjustible run time setting so that dial up users could have very long runs, but inadvertently exacerbated the 1% problem. Rom recommends we reduce the maximum time to 12 hours or so. The only drawback is for dialup users who want very long work units. Rom suggests they load up on work units each time they log in, and with 12 hour work units this could last for a while.

So the question is: are there any objections to reducing the maximum time to 12 hours. Unfortunately, this is a work unit level parameter, and cannot be changed by the user.
ID: 12483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Darren
Avatar

Send message
Joined: 6 Oct 05
Posts: 27
Credit: 43,535
RAC: 0
Message 12484 - Posted: 22 Mar 2006, 3:08:41 UTC - in response to Message 12483.  
Last modified: 22 Mar 2006, 3:10:34 UTC

So the question is: are there any objections to reducing the maximum time to 12 hours. Unfortunately, this is a work unit level parameter, and cannot be changed by the user.


I'm one of those (probably few) who use long runtimes. I would just ask that you keep us updated as to the time frame - specifically, let us know when the option for long runtimes is back. My preference for it is a little different from that of the dial-up users in that my internet connection is by cellular modem. The amount of data transfered isn't a problem, but rosetta for some reason gives me much more grief than other projects when there are gaps in the data flow. If a particular file times out on other projects it starts right back up 60 seconds later and the work unit runs fine after it all finally gets here. When any individual file in the download times out with rosetta, the download always starts back up but once it all gets here the work unit immediately reports a download error.

This requires me to "babysit" the connection and manually suspend network access during any data flow gaps while rosetta is downloading work. Oddly, if I manually suspend it, it picks right back up without causing an error - but if I let it timeout and resume it doesn't work. With that, I'll probably just suspend rosetta unless I know I'll be home long enough to babysit the connection through a few downloads - thus my request that you be sure to let us know when the option is available again.

Thanks.



ID: 12484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 12493 - Posted: 22 Mar 2006, 8:11:42 UTC

If it will help lessen the hardship of those suffering the stuck at 1% problems and their loss of cpu time, then I'd be willing to switch down to only 12-16 hour max cpu time. Unfortunately, I no longer need low bandwidth usage; so my vote doesn't really count.


ID: 12493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 12494 - Posted: 22 Mar 2006, 8:12:12 UTC
Last modified: 22 Mar 2006, 8:20:01 UTC

ID: 12494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Johnathon

Send message
Joined: 5 Nov 05
Posts: 120
Credit: 138,226
RAC: 0
Message 12500 - Posted: 22 Mar 2006, 9:35:33 UTC

I'm running 1 machine on dial up, and I've set it to 48hr runtimes - it is just so much easier for me, uploading 1 job, downloading 1 job every 2 days, instead of baby sitting the machine daily for each network run, daily. That machine would probably just get shut down permenantly again if the run times go down.
ID: 12500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 12503 - Posted: 22 Mar 2006, 10:33:15 UTC
Last modified: 22 Mar 2006, 10:34:11 UTC

I'm a little confused - how is reducing the maximum run time going to help with the 1% problem? I have my maximum run time set to 2 hours, but the WUs that get stuck on 1% go way past this - sometimes several days past. So it's fairly evident that stuck WUs are ignoring the run time setting anyway, so changing the maximum possible setting isn't going to make much difference.

If you're adding extra code to catch stuck WUs that's great.
ID: 12503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nozi

Send message
Joined: 15 Nov 05
Posts: 11
Credit: 566,793
RAC: 141
Message 12506 - Posted: 22 Mar 2006, 10:42:19 UTC
Last modified: 22 Mar 2006, 10:57:22 UTC

I would think a limit of 24 hours may be good enough. My setting is 4 hours but since i am not capable of reacching every machine every day the 1 % bug took up to 3 days . But upload / download never was a problem for me. 12 or 16 hours seems too short for some. The only really good solution would still be to eleminate the bug .
In this way you can on one hand loose several hours cpu time from all users .
With the other way you force several completely out of rosetta.
So it is your decision to find a sufficient compromise.

IMHO i would prefer not to loose anyone comletely.

Communication Basic No.1 :

Freedom is always the Freedom to dissent.
ID: 12506 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 12508 - Posted: 22 Mar 2006, 13:00:44 UTC - in response to Message 12503.  

I'm a little confused - how is reducing the maximum run time going to help with the 1% problem? I have my maximum run time set to 2 hours, but the WUs that get stuck on 1% go way past this - sometimes several days past. So it's fairly evident that stuck WUs are ignoring the run time setting anyway, so changing the maximum possible setting isn't going to make much difference.

If you're adding extra code to catch stuck WUs that's great.



The user adjustable time setting and the Max run time setting are separate. If the Max run time is set to a lower value by the project it will override any user setting for time. When the WU hits the Max time set by the project it will abort. So if a WU gets stuck, the Max time setting will cause it to abort automatically when it hits 12 hours.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 12508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12516 - Posted: 22 Mar 2006, 15:41:47 UTC

Based on the responses below, I think we shouild set the maximum time to 24 hours. How many people would this be a hardship for?
ID: 12516 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nite Owl
Avatar

Send message
Joined: 2 Nov 05
Posts: 87
Credit: 3,019,449
RAC: 0
Message 12524 - Posted: 22 Mar 2006, 18:41:43 UTC

Works for me! :thumbsup:
ID: 12524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
uioped1
Avatar

Send message
Joined: 9 Feb 06
Posts: 15
Credit: 1,058,481
RAC: 0
Message 12604 - Posted: 24 Mar 2006, 6:25:03 UTC - in response to Message 12500.  

I'm running 1 machine on dial up, and I've set it to 48hr runtimes - it is just so much easier for me, uploading 1 job, downloading 1 job every 2 days, instead of baby sitting the machine daily for each network run, daily. That machine would probably just get shut down permenantly again if the run times go down.


If you set your cache size to connect every two days, your boinc client should download enough work for 2 days, regardless if your runtime is set to 48 hours or 12...
(of course this is after your machine has adjusted to the new runtime of your workunits after you make changes.)
ID: 12604 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Johnathon

Send message
Joined: 5 Nov 05
Posts: 120
Credit: 138,226
RAC: 0
Message 12612 - Posted: 24 Mar 2006, 9:41:38 UTC

It means I've got to sit thru 8mb or more downloads instead of 4mb, if i update 1nce every 2 days. If its better for bakerlab, then I dont mind too much the 24hr, but for me it would be easier @ 48hr.
ID: 12612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12622 - Posted: 24 Mar 2006, 16:13:42 UTC - in response to Message 12612.  

It means I've got to sit thru 8mb or more downloads instead of 4mb, if i update 1nce every 2 days. If its better for bakerlab, then I dont mind too much the 24hr, but for me it would be easier @ 48hr.



For us it really doesn't matter--our Science depends only on the overall throughput. but we thought this would be better until we have the 1% problem solved so people don't have to babysit machines
ID: 12622 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Any objections to reducing the maximum run time to 12-16 hours?



©2024 University of Washington
https://www.bakerlab.org