Report long-running models here

Message boards : Number crunching : Report long-running models here

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 14 · Next

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 60832 - Posted: 26 Apr 2009, 17:42:55 UTC

This wu https://boinc.bakerlab.org/rosetta/workunit.php?wuid=225141500 strikes me as odd. I have the run time set to the 6 hour standard, but this one seems to be in a loop. It has 10:03 to go for completion, drops to 10:02 then quickly jumps back to 10:03. It has been doing that for at least the last 30 minutes, maybe more. Collectively it has run 08:17:46. I have suspended it pending advice.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 60832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60843 - Posted: 27 Apr 2009, 13:28:28 UTC

adrianxw, mini's watchdog kicks in after preferred runtime plus 4 hours. So you haven't quit made it there yet. You are looking at time remaining, which is simply computed from the % complete and the CPU time. The watchdog will look at CPU time used. Since the task is running long, Rosetta is slowing the rate at which it increases the % complete, and so it makes the resulting estimated completion time bounce around. But it is better then reaching 100% and not being done yet.

Since your runtime is 6hrs, let it run for at least 10.25hrs before you worry about it. The watchdog should cleanly take care of it before that time. And if you are running the new 6.6 client, use the runtime shown in the Rosetta graphic, NOT the one shown in the BOINC manager. I've seen cases where they are significantly different.
Rosetta Moderator: Mod.Sense
ID: 60843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 60845 - Posted: 27 Apr 2009, 14:15:21 UTC
Last modified: 27 Apr 2009, 15:03:48 UTC

Fair enough, it is running again, and doing the same thing with respect to the 10:03->10:02->10:03. BOINC is 6.6.20. BOINC Manager says it 08:29:30 now whilst the BOINC graphic, which I don't normally look at says 08:27:30.

I notice the run time is on the parameters list now, (may have been for ages, it is not something I usually look at), and that the 6 hours comes from there. Is that a scientific optimum or a twitchy cruncher limit?

(Edit spelling, gee, shakes head and mumbles...)

(Edit again, job was still 10:03 when suddenly it went to "Ready to report" after 08:48:27)

(Edit again, hmm and miserly credit as well, ah well)
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 60845 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 60850 - Posted: 27 Apr 2009, 18:40:00 UTC

ID: 60850 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 60851 - Posted: 27 Apr 2009, 19:45:25 UTC - in response to Message 60845.  

Fair enough, it is running again, and doing the same thing with respect to the 10:03->10:02->10:03. BOINC is 6.6.20. BOINC Manager says it 08:29:30 now whilst the BOINC graphic, which I don't normally look at says 08:27:30.

I notice the run time is on the parameters list now, (may have been for ages, it is not something I usually look at), and that the 6 hours comes from there. Is that a scientific optimum or a twitchy cruncher limit?


From what I've seen, 6 hours is now the default setting for those who haven't set their workunit size to something else. It's better for handling typical workunits now than the previous default of 2 hours, and larger values put less of a load on the server.
ID: 60851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 60861 - Posted: 28 Apr 2009, 6:45:51 UTC
Last modified: 28 Apr 2009, 6:49:28 UTC

My question was more, "Is there scientific benefit from having longer run times?". If the team get more out of 2 x 12 hour units then 4 x 6 then I'd like to know. I don't know enough about their models to answer this myself.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 60861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60863 - Posted: 28 Apr 2009, 8:54:29 UTC

According to the preferences the default is 3 hours if not selected. I am not selected and as best as I can recall the run times they are still about 3 hours ...
ID: 60863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 650
Credit: 11,637,805
RAC: 799
Message 60864 - Posted: 28 Apr 2009, 9:15:56 UTC

Well, perhaps that is true for you Paul, but I just checked mine again and it is quite definitely 6.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 60864 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60868 - Posted: 28 Apr 2009, 13:22:22 UTC

The benefit to the science is more based on how many hours of crunching per day you do. Not how many hours per task. But less hits to the servers per day from your machine leaves them free to service more participants with the same server hardware.

The preference for runtime is set in the Rosetta-specific preferences on the website. Click on the "[ Participants ]" link above. Keep in mind that preferences exist for each venue you have set up.
Rosetta Moderator: Mod.Sense
ID: 60868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60882 - Posted: 29 Apr 2009, 7:56:29 UTC - in response to Message 60864.  

Well, perhaps that is true for you Paul, but I just checked mine again and it is quite definitely 6.



Target CPU run time
(not selected defaults to 3 hours)


from the settings. If you had changed it to 6, it will be 6 ...

Unless we are accessing different web sites ... :)
ID: 60882 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 60886 - Posted: 29 Apr 2009, 16:18:14 UTC - in response to Message 60868.  

Adrianxw: My question was more, "Is there scientific benefit from having longer run times?". If the team get more out of 2 x 12 hour units then 4 x 6 then I'd like to know. I don't know enough about their models to answer this myself.
Mod.Sense: The benefit to the science is more based on how many hours of crunching per day you do. Not how many hours per task. But less hits to the servers per day from your machine leaves them free to service more participants with the same server hardware.


I'm curious about the topic that adrianxw raises. Bear with me 'cause I'm no linguist nor writer.

Tasks begin with a seed, some starting point from which models are developed, no? Is this akin to a tree with roots and branches that grow from the seed? The longer the tree lives (task run time) the more roots and branches, no? So based on this it "seems" that a long run time is better, more output based on the one input seed.

No two trees look alike. A different seed leads to different roots and branches and different outcome I guess? So if you run a unique seed for 24 hours you get one set of roots, branches, models. For 12 hours work presumably you get say half as many roots and branches. So if you run a task for 12 hours instead of 24 then how do you uncover/discover those roots and branches that would have been revealed if the task was allowed to run 24 hours?

Also, after a 12-hour run you are assigned a new task of whatever kind with another unique starting seed. So, a participant running 4, 6-hour tasks would likely get 4 different kinds of "proteins to model" with 4 different seeds. While a single task of 24 hours length would produce results for one "protein" study.

Anyway, what I'm trying to convey in a peculiar way is that I don't see how one can compare what is ostensibly achieved from a host running 4x6 tasks, or 1x24 task, or 8x3 tasks, or whatever. I don't see how you can say that it doesn't matter whether you run 1x24 or 4x6, because it seems to me that it does matter. The missing link for me to understand this is the whole concept of a "seed", where it comes from and how it relates from one task to another. That is, if I run a task with a given seed for 12 hours, is there another task sent out with a different seed that ostensibly reveals the information that would have been produced if the first task had run for 24 hours? Or does the project not care about that?

I think people want to know the optimal run time that produces the optimum scientific output. You say it doesn't matter. What needs explaining is how the "seed" concept renders the run time as moot and irrelevant with regard to scientific achievement.

Sorry, wish I could do better.
ID: 60886 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60889 - Posted: 29 Apr 2009, 18:49:45 UTC

Idle, you have the basic idea. I believe the point you are missing is that there are literally trillions of trillions (of trillions...) of potential branches possible. And so the objective is to pursue some small subset of those. It doesn't matter to the science of the completed "trees" are small with 10 branches, or large with 30 branches. Every completed "branch" adds to the subset the scientists are seeking to review.

When they set out to study a protein, they decide which of the approaches could best be applied to the study, and they have some round number of models they would like to complete (say 100,000, it varies depending on many factors). They don't define any specific 100,000 that must be reviewed. They are originated randomly in order to get a sampling of what's out there. And so tasks will be created and sent to clients until the desired 100,000 models are completed and returned. If some specific tasks are not returned (i.e. some seeds die), it doesn't harm the overall growth of the forest. It was an outcome that was anticipated from the start.

To follow your analogy, if the goal was to produce a forest with 100,000 branches in 10 years, you would plant seeds in the appropriate number, and of appropriate tree variety to assure that the goal will be met with some reasonable margin of error for weather, germination rates, lightening strikes, and insect damage.

Rosetta@home works much the same way. They have some idea how many models the average machine is going to produce, and how many WUs they have to create to get the desired number of results.
Rosetta Moderator: Mod.Sense
ID: 60889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 60895 - Posted: 29 Apr 2009, 20:14:45 UTC - in response to Message 60889.  

Maybe if I watched the screen saver and observed how models are generated I might better comprehend the methodology. I just never wanted to waste cpu cycles on it nor visually sate myself with wiggling molecules like some people apparently do. Graphics add nothing to the scientific value.
ID: 60895 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60896 - Posted: 29 Apr 2009, 20:39:56 UTC

Anyway, the science isn't as concerned about whether it is a birch branch or a maple. So long as the total forest has the desired number of branches in it in the desired timeframe, then they have what they need.
Rosetta Moderator: Mod.Sense
ID: 60896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 61358 - Posted: 25 May 2009, 11:56:52 UTC

A long-running 1.67 workunit:

5/24/2009 8:56:43 PM rosetta@home Starting epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0
5/24/2009 8:56:44 PM rosetta@home Starting task epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0 using minirosetta version 167

I requested 12-hour workunits.

So far, this one has used nearly 10 CPU hours, is 1.230% completed, and the time to completion estimate is nearly 32 hours and constantly increasing.

Since I'm preparing to upgrade BOINC on this machine to 6.6.28, I may have to abort this workunit if it takes too long to complete.
ID: 61358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1966
Credit: 38,183,238
RAC: 10,603
Message 61360 - Posted: 25 May 2009, 12:55:44 UTC - in response to Message 61358.  

A long-running 1.67 workunit:

5/24/2009 8:56:43 PM rosetta@home Starting epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0
5/24/2009 8:56:44 PM rosetta@home Starting task epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0 using minirosetta version 167

I requested 12-hour workunits.

So far, this one has used nearly 10 CPU hours, is 1.230% completed, and the time to completion estimate is nearly 32 hours and constantly increasing.

Since I'm preparing to upgrade BOINC on this machine to 6.6.28, I may have to abort this workunit if it takes too long to complete.

Officially I think a long-running WU is "Requested time + 3 hours" = 15 hours for you. Don't worry about the time-to-completion figure because that's just an extrapolation of how long decoys are taking in the current WU. When the current one finishes it'll recalculate whether it can complete another in the remainder of 12 hours and finish immediately if it can't.

You may also need to reduce run time to 8 hours to complete what you've got outstanding. Or just abort them all now if you've got time to do the upgrade. No real point hanging around when you're just 3 days before deadline.
ID: 61360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,797,168
RAC: 648
Message 61361 - Posted: 25 May 2009, 15:31:06 UTC - in response to Message 61358.  

A long-running 1.67 workunit:

5/24/2009 8:56:43 PM rosetta@home Starting epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0
5/24/2009 8:56:44 PM rosetta@home Starting task epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0 using minirosetta version 167

I requested 12-hour workunits.

So far, this one has used nearly 10 CPU hours, is 1.230% completed, and the time to completion estimate is nearly 32 hours and constantly increasing.

Since I'm preparing to upgrade BOINC on this machine to 6.6.28, I may have to abort this workunit if it takes too long to complete.


There is something odd going on here though I'm not sure it's indicative of a long-running model. The % complete number is based on your requested runtime and while math is not my strong subject I feel perfectly confident in stating that 10 hours is not 1.23% of 12 hours. I suspect rosetta is not in fact receiving any cpu time but BOINC thinks it is (it's BOINC providing the "to completion" estimate not rosetta). I don't know how to check this on a windows machine, Task Manager, maybe?

Can you open the graphics window? Are the numbers there the same as(or very close to)the numbers in BOINC manager?

Before you give up and abort the WU you might try stopping and restarting BOINC and see if that shakes anything loose.

Snags

ID: 61361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 61377 - Posted: 26 May 2009, 9:13:02 UTC - in response to Message 61361.  
Last modified: 26 May 2009, 9:23:15 UTC

I tried suspending all workunits, then rebooting, since I haven't seen any other information on how to restart BOINC. This restarted this workunit with only 11 CPU minutes shown as already used. This lost any more time it had already used, but then allowed it to complete successfully in about 12 more hours as far as I can tell. I've postponed the BOINC upgrade long enough to
finish all workunits already downloaded, but haven't started the upgrade yet.

I suspect that the workunit had run into the lockfile problem, and therefore had the minirosetta program mainly waiting in hope the lockfile problem would go away, but this restart did not seem to preserve enough information that I could check this.
ID: 61377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61381 - Posted: 26 May 2009, 14:53:49 UTC

Just a few observations on the recent posts here:
1) the watchdog kicks in at target runtime plus 4 hours. If a task runs longer then that, the watchdog will pack it up and send it home.

2) the 6.6.x BOINC client versions now show ELAPSED time, rather then CPU time. So it is entirely possible people report tasks using time and not progressing when their machine doesn't have any CPU time available to run low priority tasks such as BOINC applications.

3) You can upgrade BOINC any time. Even with work in progress. The Rosetta application is still the same, and this is what is truely processing the work, so the BOINC upgrade should not pose a problem.
Rosetta Moderator: Mod.Sense
ID: 61381 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61384 - Posted: 26 May 2009, 17:40:45 UTC - in response to Message 61381.  

3) You can upgrade BOINC any time. Even with work in progress. The Rosetta application is still the same, and this is what is truely processing the work, so the BOINC upgrade should not pose a problem.

I'd expect that to be the case, but it's never worked for me. Queued WUs don't get picked up by the new BOINC version and a load more come down in their place. I can see the old WUs sitting on this website, but they never get run and end up expiring.

I thought that happened to everyone. Am I wrong? Looks like it :(
ID: 61384 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 14 · Next

Message boards : Number crunching : Report long-running models here



©2024 University of Washington
https://www.bakerlab.org