Message boards : Number crunching : Minirosetta 3.52
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
Lab members are just not posting to the R@h FB page. I'll add something to our technical news. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
No I didn't but I will. Thanks for the suggestion. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks. 6 hours? I didn't realise. On my two default (24 hour) machines I'm already set to 8hrs, so that's no problem for me. I've got access to one of my less-regular machines on another username, which was set to 4hrs, and notice it's still at 4hrs. I've changed it to the 6hr default to fall in line and will see how it goes over the next week. I'm not expecting an issue. I have one less regular machine which is only on for a day or two a week for a few hours and set at default, which I'm inclined to knock back down to 3hrs, even though the 14 day deadline will help. I understand why it's been done with the vast increase in active users and dare say the vast majority are set at default and will be none the wiser. The sign of this being a bad move will be if there's an increase in people missing even the extended deadline as Boinc defaults aren't that productive imo. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
I extended the deadline to 14 and also updated the default run time to 6 hours during the recent chaos. Any input on this is appreciated, good or bad. I can revert to the previous values if necessary. Thanks. On a similar subject, when the next CASP tasks are loaded up, is it possible to give them appropriate deadlines, but keep non-CASP tasks at 14 days? Iirc CASP tasks had to back to you within 48 hours (inc runtime) to meet the deadlines you have. I mentioned this before, but it was too late to do anything about it. |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
thanks for providing the user selectable "Target CPU run time". as it turns out when left in the default i'm seeing some jobs running to the extent of 10 hours (in estimated run) and various others 5 hours (in estimated run), i've gone ahead to set some defaults that's somewhat lower than 6 hours as it is more appropriate for me. However, i'd think the user selectable "Target CPU run time" would really be a useful thing to help the various participants. the reason i set a lower run time is simply due to that i only run it (boinc/rosetta) during the 'idle' hours and the pc is switched off when no one is at home. i'm also of the same opinion as Celery that an extended long hours doesn't help much given that it takes longer to return results and that some of the jobs may simply expire before they could be completed by the deadline i normally play a 'good citizen', pull just enough jobs complete them and submit the results. this give a much better turn around time and the jobs often complete without errors or even if it error-ed out, it is reverted as soon as the status is known. i think this is much better in terms of turning around the results promptly possibly for the scientists who are waiting for the incremental results. etc rather than to pull a lot of 'unused' jobs which i may after all not crunch which i may later simply have to cancel them. that may vary/be different for other participants who may possibly leave the PCs crunching round the clock and/or may use a 'slower' cpu. Hence, user selectable "Target CPU run time" is a useful thing. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Of course, doubling it is only correct for the portion of folks that are running at the default runtime. I was thinking when I had posted the suggestion that I'd had a reason for not suggesting this previously and couldn't recall what it was. Anyway, I suspect more than half of the profiles are using the defaults anyway. And they are also probably the ones that pay the least attention to message boards and user preferences, so in the spirit of "set it and forget it", that would be the portion that it is most important to match with. Rosetta Moderator: Mod.Sense |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
thanks for providing the user selectable "Target CPU run time". turns out that the boinc manager (gui) estimates for time left may be somewhat off, most of those '5 hours' tasks seemed to be completed in the original 3+ hours, '10 hours' tasks seemed to be completing in about 6 hours which is the default run time if nothing is selected. the estimates going off the mark may be due to my pc running slower possibly for various reasons including multitasking with other non boinc tasks the estimated computational size (gflops) for the 6 and 3 hour jobs seemed ok though, 80,000 gflops and 40,000 gflops respectively. not sure if these info may be useful |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
Definitely most of the 40+ thousand active hosts are set it and forget it users. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
Did you double the WU estimated FLOPS as well? That would help BOINC Manager have good duration correction factors right from the start. I spotted this earlier and guessed it might be related. On the flipside, being one of those who has tweaked my runtime already, I also tweaked the default buffer from 0.25 days up to 2 days, so by the time those tasks are worked through, it should resolve itself. In the meantime, the rush of demand for tasks from the new users ought to have settled down too. It's all good. Probably... <cough> |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
thanks for providing the user selectable "Target CPU run time". It's not quite that. Rosetta keeps a record of how much uptime your machine has, so as long as your processing pattern is consistent it'll make the necessary allowances. I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline. For that kind of reason, I consider Boinc defaults to be very unfriendly for productive task completion - it could even be that Rosetta isn't the project for them. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code. Love it! ...but I'd have to say, overall, it might generate more TFLOPS if you instead worked on somehow making it easier for developers of the various protocols to implement more frequent checkpointing. If the casual user didn't lose an hour of progress when they power off, they would generally reach completion before the deadline. Another idea would be to implement the trickle reporting of partial results when a model is completed. This would bring many of the results back to the project much sooner, but no doubt complicate WU validation. This would help eliminate the trade-off between an efficient, long runtime, and having immediate results in the hands of the researcher. Rosetta Moderator: Mod.Sense |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
i'm also suspecting that a lengthy default run time may *discourage* some users (especially the new novice users). i noted recently that some of the work units i've completed has been aborted by other users or that it ends with a 'no reply' status https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616151740 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616144558 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=616162794 while it is uncertain if users aborted the jobs related to the run time or even simply abandoned boinc runs after trying them out. i'd think 'too lengthy' a default run time could have this *discouragement* as a negative effect but of course, today there is this "Target CPU run time" that users can define which would help alleviate that for affected users. perhaps it could be documented in an easily accessible page so that novice users etc could take note of the feature |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
There should be a way to complete the task if models have been generated, the run time is not close to the target run time, and the deadline is near. I can look into adding such code. +1 from me. The biggest problem is those jobs that don't checkpoint at all and run until the watchdog shuts them down. With the current new default, anything less than 10hrs solid running starts them from scratch at every reboot until the deadline passes. For those using Boinc defaults (already stated to be the vast majority of users) it would be more productive to abort them on sight. Chances are they'll hardly ever report back. That should be obvious. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2145 Credit: 41,560,787 RAC: 9,320 |
I was referring more to the Boinc defaults of only running when other processing is below a certain %age. When I first started with Rosetta I found the WU processing was more stop than start, so if people are like you and only run for a certain part of the day and turn the machine off when unneeded, tasks can take an awful lot of time to complete. Whether that exceeds the new extended deadline of 14 days depends on the individual host. I certainly know people who only switch on for 2 or 3 hours maybe twice a week. In their case I can well imagine 14 days not being enough to complete a 6 hour task within the deadline. I suspect you have too high an expectation of most users. Target CPU runtime has always existed. It's just a little more flexible now. But the people who post here, like you and me, are very much the exception. The "set & forget" option is much more the norm. A document would be nice - no objection to it - but unlikely to gain much of a readership beyond what it is now. Aborting tasks is clearly different from tasks being timed out. One is an active choice, the other the result of no choice at all. I doubt there's much of a "discouragement" factor. More that defaults don't coincide with a normal pattern of behaviour for ordinary people. That's why I suggested the proportion of tasks failing to meet deadline should be monitored following the changes. Personally I'd have gone to 4hrs first, but obviously the vast increase in users required a more extreme and urgent response at the time. I trust TPTB will make the appropriate assessment, seeing as they're the ultimate beneficiaries. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
You all raise great points and suggestions. Definitely more frequent checkpointing would save a bunch of computing for some protocols, particularly the homology modeling protocol. I will look into this. Forward folding has pretty good checkpointing in place and after CASP, will likely be the most common type of workunit. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So, short of frequent checkpoints on all protocols, it would be ideal to not send the WUs that do not checkpoint as well to hosts that are not active many hours per day. [arm twist]If you upgrade the BOINC server code, you could use the job size matching to avoid assigning such tasks to machines that have a higher average turnaround, or low % BOINC active.[/arm twist] Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
[arm twist]]A few other improvements that both the project and the users might enjoy in updated server code (we've had requests for several of the team and user functions, support for badges must be in there somewhere too, might even fix the msg boards so this thread isn't two screens wide): [25675] Add feature for specifying plan classes in an XML file [25321] Move antique file deletion to a separate program [22778] Server support for Virtualbox applications [22440] Deal correctly with 32-bit apps that require > 2GB RAM [20807] Improved implementation of locality scheduling [20149] Client versions include release. Projects may need to update app_version.min_core_version, config options [19053] Project-specified access control for admin web pages [18764] All project-specific scheduling policies on a per-job level [18182] Support read-only DB replica correctly [17430] Support a combination of locality and regular scheduling [15543] Fix problem were clients with malformed global prefs get perpetual "Incomplete request" errors; fix bug that broke create_work [15398] Handle quotes and slashes correctly in profiles and forums; fix bugs in team foundership transfer mechanism [15281] Add support for matchmaker scheduling [15137] Add "job size matching" feature (send large jobs to fast hosts) [14842] Add super-easy mechanism for submitting single jobs [14767] Add mechanism for assigning work to hosts, users, or teams [14448] Add uniform/flexible notification mechanism; users can choose 1 email per event, daily digest email, or no email. REQUIRES ADDING NOTIFY.PHP AS A PERIODIC TASK IN CONFIG.XML [14367] Add 'weak account key' mechanism [14297] Config option to make team forums visible only to members [14294] Prevent UOTD from showing big image on front page. Use show_uotd(). [14272] Team search feature [14240] HTML-escape text in BOINC-wide team export file [14234] Add "team message board" feature [14229] Add optional user job submission system [14084] Add user search feature - link to this from home page [13964] lines/page in top user/team/host lists is configurable [13945] Add "merge computers by name" feature [13732] New and improved "Find a team" function [13673] Fix an annoyance using team foundership transfer [13463] Preserve project specific preferences during web RPC [13231] Let team founders view history of people joining/quitting team [13223] Support for 'BOINC-wide teams' [13193] Add 'suspend_if_no_recent_input' preference (let hosts power down) [13182] Add 'mark all threads as read' feature (forums) [13127] Improved feeder query; may fix DB performance problems [13045] Relax restrictions on merging hosts [12912] Add <no_darwin_6>, <no_amd_k6> options [12834] Make list of supported platforms visible in get_project_config.php [12813] Add a forum preference for private message notification [12785] Add "merge hosts by name" function [12754] Add Paypal-based donation system [12743] Add mechanism to end project gracefully [/arm twist] Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades. |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
I'll look into the server upgrade. It will be a long process since there is a lot of R@h specific code. Priorities for now are first to release our android app and then to add a replica DB and upgrade the server code. The later may require significant down time so we need to plan this with the on going research projects in the lab. We also have to look into hardware upgrades. Thank you, that would be great! Greetings, TJ. |
Message boards :
Number crunching :
Minirosetta 3.52
©2024 University of Washington
https://www.bakerlab.org