Minirosetta 3.52

Message boards : Number crunching : Minirosetta 3.52

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77337 - Posted: 14 Aug 2014, 19:55:45 UTC

Lab members are just not posting to the R@h FB page. I'll add something to our technical news.
ID: 77337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77338 - Posted: 14 Aug 2014, 20:12:05 UTC - in response to Message 77334.  



Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.




No I didn't but I will. Thanks for the suggestion.
ID: 77338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77339 - Posted: 15 Aug 2014, 2:24:59 UTC - in response to Message 77333.  

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

6 hours? I didn't realise.

On my two default (24 hour) machines I'm already set to 8hrs, so that's no problem for me. I've got access to one of my less-regular machines on another username, which was set to 4hrs, and notice it's still at 4hrs. I've changed it to the 6hr default to fall in line and will see how it goes over the next week. I'm not expecting an issue.

I have one less regular machine which is only on for a day or two a week for a few hours and set at default, which I'm inclined to knock back down to 3hrs, even though the 14 day deadline will help.

I understand why it's been done with the vast increase in active users and dare say the vast majority are set at default and will be none the wiser. The sign of this being a bad move will be if there's an increase in people missing even the extended deadline as Boinc defaults aren't that productive imo.
ID: 77339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77340 - Posted: 15 Aug 2014, 2:50:11 UTC - in response to Message 77333.  

I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks.

On a similar subject, when the next CASP tasks are loaded up, is it possible to give them appropriate deadlines, but keep non-CASP tasks at 14 days? Iirc CASP tasks had to back to you within 48 hours (inc runtime) to meet the deadlines you have. I mentioned this before, but it was too late to do anything about it.

ID: 77340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77341 - Posted: 15 Aug 2014, 12:57:49 UTC
Last modified: 15 Aug 2014, 13:31:18 UTC

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run), i've gone ahead to set some defaults that's somewhat lower than 6 hours as it is more appropriate for me.
However, i'd think the user selectable "Target CPU run time" would really be a useful thing to help the various participants.

the reason i set a lower run time is simply due to that i only run it (boinc/rosetta) during the 'idle' hours and the pc is switched off when no one is at home.

i'm also of the same opinion as Celery that an extended long hours doesn't help much given that it takes longer to return results and that some of the jobs may simply expire before they could be completed by the deadline

i normally play a 'good citizen', pull just enough jobs complete them and submit the results. this give a much better turn around time and the jobs often complete without errors or even if it error-ed out, it is reverted as soon as the status is known. i think this is much better in terms of turning around the results promptly possibly for the scientists who are waiting for the incremental results. etc rather than to pull a lot of 'unused' jobs which i may after all not crunch which i may later simply have to cancel them.

that may vary/be different for other participants who may possibly leave the PCs crunching round the clock and/or may use a 'slower' cpu. Hence, user selectable "Target CPU run time" is a useful thing.
ID: 77341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77342 - Posted: 15 Aug 2014, 13:28:35 UTC - in response to Message 77338.  



Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.




No I didn't but I will. Thanks for the suggestion.


Of course, doubling it is only correct for the portion of folks that are running at the default runtime. I was thinking when I had posted the suggestion that I'd had a reason for not suggesting this previously and couldn't recall what it was.

Anyway, I suspect more than half of the profiles are using the defaults anyway. And they are also probably the ones that pay the least attention to message boards and user preferences, so in the spirit of "set it and forget it", that would be the portion that it is most important to match with.
Rosetta Moderator: Mod.Sense
ID: 77342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77343 - Posted: 15 Aug 2014, 16:09:37 UTC - in response to Message 77341.  
Last modified: 15 Aug 2014, 16:15:36 UTC

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run)


turns out that the boinc manager (gui) estimates for time left may be somewhat off, most of those '5 hours' tasks seemed to be completed in the original 3+ hours, '10 hours' tasks seemed to be completing in about 6 hours which is the default run time if nothing is selected.
the estimates going off the mark may be due to my pc running slower possibly for various reasons including multitasking with other non boinc tasks

the estimated computational size (gflops) for the 6 and 3 hour jobs seemed ok though, 80,000 gflops and 40,000 gflops respectively. not sure if these info may be useful
ID: 77343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77344 - Posted: 15 Aug 2014, 17:52:02 UTC
Last modified: 15 Aug 2014, 17:52:16 UTC

Definitely most of the 40+ thousand active hosts are set it and forget it users.
ID: 77344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77346 - Posted: 16 Aug 2014, 2:07:34 UTC - in response to Message 77342.  

Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start.

No I didn't but I will. Thanks for the suggestion.

Of course, doubling it is only correct for the portion of folks that are running at the default runtime. I was thinking when I had posted the suggestion that I'd had a reason for not suggesting this previously and couldn't recall what it was.

I spotted this earlier and guessed it might be related.

On the flipside, being one of those who has tweaked my runtime already, I also tweaked the default buffer from 0.25 days up to 2 days, so by the time those tasks are worked through, it should resolve itself.

In the meantime, the rush of demand for tasks from the new users ought to have settled down too. It's all good. Probably... <cough>

ID: 77346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77347 - Posted: 16 Aug 2014, 2:26:12 UTC - in response to Message 77341.  

thanks for providing the user selectable "Target CPU run time".
as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run), i've gone ahead to set some defaults that's somewhat lower than 6 hours as it is more appropriate for me.
However, i'd think the user selectable "Target CPU run time" would really be a useful thing to help the various participants.

the reason i set a lower run time is simply due to that i only run it (boinc/rosetta) during the 'idle' hours and the pc is switched off when no one is at home.

I'm also of the same opinion as Celery that an extended long hours doesn't help much given that it takes longer to return results and that some of the jobs may simply expire before they could be completed by the deadline.

It's not quite that. Rosetta keeps a record of how much uptime your machine has, so as long as your processing pattern is consistent it'll make the necessary allowances.

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.
ID: 77347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77348 - Posted: 16 Aug 2014, 3:05:28 UTC

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.
ID: 77348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77349 - Posted: 16 Aug 2014, 12:45:41 UTC - in response to Message 77348.  

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.


Love it!

...but I'd have to say, overall, it might generate more TFLOPS if you instead worked on somehow making it easier for developers of the various protocols to implement more frequent checkpointing. If the casual user didn't lose an hour of progress when they power off, they would generally reach completion before the deadline.

Another idea would be to implement the trickle reporting of partial results when a model is completed. This would bring many of the results back to the project much sooner, but no doubt complicate WU validation. This would help eliminate the trade-off between an efficient, long runtime, and having immediate results in the hands of the researcher.
Rosetta Moderator: Mod.Sense
ID: 77349 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77351 - Posted: 16 Aug 2014, 15:20:55 UTC - in response to Message 77347.  
Last modified: 16 Aug 2014, 15:26:31 UTC


It's not quite that. Rosetta keeps a record of how much uptime your machine has, so as long as your processing pattern is consistent it'll make the necessary allowances.

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.



i'm also suspecting that a lengthy default run time may *discourage* some users (especially the new novice users). i noted recently that some of the work units i've completed has been aborted by other users or that it ends with a 'no reply' status

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616151740
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616144558
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616162794

while it is uncertain if users aborted the jobs related to the run time or even simply abandoned boinc runs after trying them out. i'd think 'too lengthy' a default run time could have this *discouragement* as a negative effect

but of course, today there is this "Target CPU run time" that users can define which would help alleviate that for affected users.

perhaps it could be documented in an easily accessible page so that novice users etc could take note of the feature
ID: 77351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77353 - Posted: 17 Aug 2014, 1:37:43 UTC - in response to Message 77349.  

There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code.

Love it!

...but I'd have to say, overall, it might generate more TFLOPS if you instead worked on somehow making it easier for developers of the various protocols to implement more frequent checkpointing. If the casual user didn't lose an hour of progress when they power off, they would generally reach completion before the deadline.

+1 from me.

The biggest problem is those jobs that don't checkpoint at all and run until the watchdog shuts them down. With the current new default, anything less than 10hrs solid running starts them from scratch at every reboot until the deadline passes. For those using Boinc defaults (already stated to be the vast majority of users) it would be more productive to abort them on sight. Chances are they'll hardly ever report back. That should be obvious.

ID: 77353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2124
Credit: 41,224,342
RAC: 11,119
Message 77354 - Posted: 17 Aug 2014, 1:52:35 UTC - in response to Message 77351.  

I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline.

For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them.

I'm also suspecting that a lengthy default run time may *discourage* some users (especially the new novice users). i noted recently that some of the work units i've completed has been aborted by other users or that it ends with a 'no reply' status

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616151740
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616144558
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616162794

while it is uncertain if users aborted the jobs related to the run time or even simply abandoned boinc runs after trying them out. i'd think 'too lengthy' a default run time could have this *discouragement* as a negative effect

but of course, today there is this "Target CPU run time" that users can define which would help alleviate that for affected users.

perhaps it could be documented in an easily accessible page so that novice users etc could take note of the feature

I suspect you have too high an expectation of most users. Target CPU runtime has always existed. It's just a little more flexible now. But the people who post here, like you and me, are very much the exception. The "set & forget" option is much more the norm. A document would be nice - no objection to it - but unlikely to gain much of a readership beyond what it is now.

Aborting tasks is clearly different from tasks being timed out. One is an active choice, the other the result of no choice at all. I doubt there's much of a "discouragement" factor. More that defaults don't coincide with a normal pattern of behaviour for ordinary people.

That's why I suggested the proportion of tasks failing to meet deadline should be monitored following the changes. Personally I'd have gone to 4hrs first, but obviously the vast increase in users required a more extreme and urgent response at the time.

I trust TPTB will make the appropriate assessment, seeing as they're the ultimate beneficiaries.
ID: 77354 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77355 - Posted: 17 Aug 2014, 3:06:49 UTC

You all raise great points and suggestions. Definitely more frequent checkpointing would save a bunch of computing for some protocols, particularly the homology modeling protocol. I will look into this. Forward folding has pretty good checkpointing in place and after CASP, will likely be the most common type of workunit.
ID: 77355 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77356 - Posted: 17 Aug 2014, 12:00:36 UTC

So, short of frequent checkpoints on all protocols, it would be ideal to not send the WUs that do not checkpoint as well to hosts that are not active many hours per day. [arm twist]If you upgrade the BOINC server code, you could use the job size matching to avoid assigning such tasks to machines that have a higher average turnaround, or low % BOINC active.[/arm twist]
Rosetta Moderator: Mod.Sense
ID: 77356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77357 - Posted: 17 Aug 2014, 12:31:37 UTC

[arm twist]]A few other improvements that both the project and the users might enjoy in updated server code (we've had requests for several of the team and user functions, support for badges must be in there somewhere too, might even fix the msg boards so this thread isn't two screens wide):

[25675] Add feature for specifying plan classes in an XML file
[25321] Move antique file deletion to a separate program
[22778] Server support for Virtualbox applications
[22440] Deal correctly with 32-bit apps that require > 2GB RAM
[20807] Improved implementation of locality scheduling
[20149] Client versions include release. Projects may need to update app_version.min_core_version, config options
[19053] Project-specified access control for admin web pages
[18764] All project-specific scheduling policies on a per-job level
[18182] Support read-only DB replica correctly
[17430] Support a combination of locality and regular scheduling
[15543] Fix problem were clients with malformed global prefs get perpetual "Incomplete request" errors; fix bug that broke create_work
[15398] Handle quotes and slashes correctly in profiles and forums; fix bugs in team foundership transfer mechanism
[15281] Add support for matchmaker scheduling
[15137] Add "job size matching" feature (send large jobs to fast hosts)
[14842] Add super-easy mechanism for submitting single jobs
[14767] Add mechanism for assigning work to hosts, users, or teams
[14448] Add uniform/flexible notification mechanism; users can choose 1 email per event, daily digest email, or no email. REQUIRES ADDING NOTIFY.PHP AS A PERIODIC TASK IN CONFIG.XML
[14367] Add 'weak account key' mechanism
[14297] Config option to make team forums visible only to members
[14294] Prevent UOTD from showing big image on front page. Use show_uotd().
[14272] Team search feature
[14240] HTML-escape text in BOINC-wide team export file
[14234] Add "team message board" feature
[14229] Add optional user job submission system
[14084] Add user search feature - link to this from home page
[13964] lines/page in top user/team/host lists is configurable
[13945] Add "merge computers by name" feature
[13732] New and improved "Find a team" function
[13673] Fix an annoyance using team foundership transfer
[13463] Preserve project specific preferences during web RPC
[13231] Let team founders view history of people joining/quitting team
[13223] Support for 'BOINC-wide teams'
[13193] Add 'suspend_if_no_recent_input' preference (let hosts power down)
[13182] Add 'mark all threads as read' feature (forums)
[13127] Improved feeder query; may fix DB performance problems
[13045] Relax restrictions on merging hosts
[12912] Add <no_darwin_6>, <no_amd_k6> options
[12834] Make list of supported platforms visible in get_project_config.php
[12813] Add a forum preference for private message notification
[12785] Add "merge hosts by name" function
[12754] Add Paypal-based donation system
[12743] Add mechanism to end project gracefully
[/arm twist]
Rosetta Moderator: Mod.Sense
ID: 77357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 77361 - Posted: 18 Aug 2014, 17:18:29 UTC

I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades.
ID: 77361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 77363 - Posted: 18 Aug 2014, 22:45:44 UTC - in response to Message 77361.  

I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades.

Thank you, that would be great!
Greetings,
TJ.
ID: 77363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Minirosetta 3.52



©2024 University of Washington
https://www.bakerlab.org