Feedback, .. bandwidth usage :-(

Author	Message
dcdc Send message Joined: 3 Nov 05 Posts: 1829 Credit: 117,372,960 RAC: 52,787	Message 8856 - Posted: 12 Jan 2006, 16:41:34 UTC - in response to Message 8855. Something to think about... Would you add 'trickling' style reporting for a work unit ? As I see it now, if the work units get longer, then it'll take longer for you to get feedback. If you can 'trickle' back the results, say after 4 iterations, then you will not need to wait till after all 50 are done (for example). This means you can analyse quicker. ALSO a way to terminate these longer jobs (say a stop at next itteration type thing) if the results are no longer needed (you have enough information, they are bad, just for fun ;-)) how and if this is possible I have no idea ;) Personally I have no problem with the longer jobs myself :) Hope to try them out. yeah - trickle-feeding results back or having the client maintain some kind of communiciation if no results are available to send after a certain amount of wall time - even if it's just 'I'm still alive and crunching the following job' - would definitely be beneficial for both the project and members - especially those of us with remotes. ID: 8856 · Rating: 0 · rate: / Reply Quote

Housing and Food Services Send message Joined: 1 Jul 05 Posts: 85 Credit: 155,098,531 RAC: 0	Message 8860 - Posted: 12 Jan 2006, 17:12:38 UTC - in response to Message 8856. Last modified: 12 Jan 2006, 17:13:03 UTC Having a flag to change the size of wu's helps alleviate the problem of bandwidth usage for those who are capped monthly and/or on dial-up. For those that don't have that problem, couldn't you stay with the existing size work units and not need the 'trickle' feature? The number of people on dial up that also would want frequent updates to the server sounds like a pretty small subset of people. -Ethan ID: 8860 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1829 Credit: 117,372,960 RAC: 52,787	Message 8865 - Posted: 12 Jan 2006, 18:02:19 UTC - in response to Message 8860. Having a flag to change the size of wu's helps alleviate the problem of bandwidth usage for those who are capped monthly and/or on dial-up. For those that don't have that problem, couldn't you stay with the existing size work units and not need the 'trickle' feature? The number of people on dial up that also would want frequent updates to the server sounds like a pretty small subset of people. -Ethan As I'm imagining it, it'd have no effect on those users who are happy with the WU size and bandwidth useage at the moment (obviously lower bandwidth usage is better though!). It'd only effect those who request large WUs, because bigger WUs means: * a longer wait for the results * greater risk of losing completed work * psychological effect of massive jobs - no credit for long periods - check out the number of posts at FaD when there were Long Running Molecules (LRMs)! * No ability to update the computer, for whatever reason, until the WU is complete. As FC says, including the ability to terminate work that is no longer needed would when it phones home be a good feature. I'd suggest that if the machine hasn't made a connection for new work in, say, four days, then it could return any results and report that it's working on job xxxx if/when an internet connection becomes available, and then the same again after another 4 days if it's not finished the WU. That way it can check to make sure the WU is still valid and still needs crunching, and also the rosetta servers can track that the job isn't lost and doesn't need to be sent out again. ID: 8865 · Rating: 0 · rate: / Reply Quote

Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0	Message 8866 - Posted: 12 Jan 2006, 18:08:22 UTC What would be best for the project? Is it "better" to have -nstruct 50, or is -nstruct 10 enough. What about -nstruct 2 or -nstruct 100, 200, 500 etc? ID: 8866 · Rating: 0 · rate: / Reply Quote

blackbird Send message Joined: 4 Nov 05 Posts: 15 Credit: 93,414 RAC: 0	Message 8868 - Posted: 12 Jan 2006, 18:42:08 UTC I've got the question similar the Andrew has - will the scientific value of the long WU remains the same. Another question. On my PC WU cpu time is about 2 hours. Considering that WU size is about 2 Mb now and acceptable traffic size for me is about 1 Mbday, would i have an opportunity to set -nstruct 200? ID: 8868 · Rating: 0 · rate: / Reply Quote

SwZ Send message Joined: 1 Jan 06 Posts: 37 Credit: 169,775 RAC: 0	Message 8875 - Posted: 12 Jan 2006, 19:51:55 UTC - in response to Message 8810. Is it possible to split data the way Einstein does? That is: send one <large_file> (protein data?) and one WU (parameters for the algorithm) at first contact with the user. The next WUs only send new parameters as long as there are WUs for this protein. If all of them are done the next WU will contain the instruction to delete <large_file> and sends <next_large_file>. Norbert Yes. Ideally protocol. One WU is one trajectory calculation. Start data (including configuration, start parameters, protocol) and results have small size. If we work with one protein we have high part of valid WU (in ten times smaller of error WUs then now) small traffic, and very fast WU completion verification and adding credits. When protein experiment done or if user want sending request to other protein, old helping big files invalidate, deleting and upload a new big files for new selected protein. But I think this new protocol needs for big remaking control server program and not realy that autors make this change. in the wu argument lists, you will see a "-nstruct 10 " which means make ten structures. if we can make the number following -nstruct a user defined parameter, it would be great; people on dial up connections could use for example -nstruct 50. the question is how to do this within the boinc setup. Yes... It is necessary to reading BOINC documentation. May be River~~ council which page reading. But if one user asking, for example, -nstruct 37 then server waiting other pairing request -nstruct 37 for dual comparision of WU? This is not good. Next this big WU must run until not finished all trajectories before reports? And any error invalidate all done trajectories? Can we power off computer many times while work one WU? If that's the case, then this option is fritterware. ID: 8875 · Rating: 0 · rate: / Reply Quote

SwZ Send message Joined: 1 Jan 06 Posts: 37 Credit: 169,775 RAC: 0	Message 8877 - Posted: 12 Jan 2006, 20:01:23 UTC - in response to Message 8875. If that's the case, then this option is fritterware. I mean if this features present: * a longer wait for the results * greater risk of losing completed work * psychological effect of massive jobs - no credit for long periods - check out the number of posts at FaD when there were Long Running Molecules (LRMs)! * No ability to update the computer, for whatever reason, until the WU is complete. As FC says, including the ability to terminate work that is no longer needed would when it phones home be a good feature. when change -nstruct option, then this control practicaly not usable. :( P.S. I still not have feedback about promissed "COMPRESSION WRAPPER CODE". ID: 8877 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 8892 - Posted: 13 Jan 2006, 1:36:27 UTC I switched a home system to DSL from cable modem. (Athlon 64 3000+ at 2Ghz) It's been left alone for 5 days.. as I've been elsewhere; so almost all the bandwidth usage is for Rosetta. It said 11 megs up, and 256 megs down. No wonder people are complaining about maxing out their bandwidth caps. Reducing the download size would definately be a benefit. As would allowing us to get a day's worth of crunching data - and an option to send the results back as they're found. I'd opt for larger WUs.. even though I don't have to worry about reaching my bandwidth cap. It will free up bandwidth capacity for others. ID: 8892 · Rating: 0 · rate: / Reply Quote

Moderator7 Volunteer moderator Send message Joined: 27 Dec 05 Posts: 10 Credit: 0 RAC: 0	Message 8902 - Posted: 13 Jan 2006, 4:55:09 UTC The ability to select the number of structures as part of the project preferences seems to be the easiest/quickest way to reduce the bandwidth; (hopefully) fairly minor code changes. This would let you "tune" the WU run length to match your preferred balance of bandwidth and run time of results due to the speed of your host(s). Of course, to reduce the bandwidth required by downloading multiple versions of the application, it's nice to have several fixes done before a new version is actually released... if "real bug" fixes are done first, and need to be shipped out quickly, it may delay shipping out the variable-length code a bit. As for the science, David Baker may have to correct me on this, but my understanding is that for each WU "type", the goal is to collect something like 10,000 different random seed runs. That can be 1000 WUs at 10 structures each, or 100 at 100 structures each (or 2000 at 5 structures each) or any combination of the above. There is some advantage to assigning the seeds on the server side, because then there is no duplication, but that then adds in having to track results that were never returned, and resending them, and so forth; I'm still not sure it's not better to let the hosts generate the random numbers, and just send out enough "extras" that the duplicates aren't a problem. (Which also makes variable-length WUs much easier.) ID: 8902 · Rating: 1 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 8904 - Posted: 13 Jan 2006, 5:14:11 UTC - in response to Message 8866. What would be best for the project? Is it "better" to have -nstruct 50, or is -nstruct 10 enough. What about -nstruct 2 or -nstruct 100, 200, 500 etc? As moderator7 says, what we need for each protocol (work unit) we are testing is a total of about 10,000 structures, ie nstruct 10,000. each one of these is made starting from a diferent random number seed. so, it helps us equally whether you do 20 jobs with nstruct 5, 5, with nstruct 20, or one with nstruct 100 AS LONG AS we get the results back within a week or so. This is because we use the results to decide on our next step in solving the structure prediction problem, and if they come in too late they aren't so useful. currently the random number seed is being determined on your computers based on the time at which the process starts. however, there is a chance that the clock time is identical when two of your jobs start, and we are in fact seeing a low level (~5%) of duplication (for people who have worried about the reliability of the results, I can confirm that when the starting seeds are identical, the results are also identical!). we will switch soon to sending out the seeds with the WU to make sure every run is unique. (we tried this a few weeks ago, but it caused problems due to a bug that we hadn't caught at the time). ID: 8904 · Rating: 1 · rate: / Reply Quote

Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0	Message 8925 - Posted: 13 Jan 2006, 11:35:03 UTC - in response to Message 8904. Last modified: 13 Jan 2006, 11:35:42 UTC As moderator7 says, what we need for each protocol (work unit) we are testing is a total of about 10,000 structures, ie nstruct 10,000. each one of these is made starting from a diferent random number seed. so, it helps us equally whether you do 20 jobs with nstruct 5, 5, with nstruct 20, or one with nstruct 100 AS LONG AS we get the results back within a week or so. This is because we use the results to decide on our next step in solving the structure prediction problem, and if they come in too late they aren't so useful. If there's a week deadline, why not have a setting in the command line flags saying run for week. The rosetta client would then try to do as many structures as it can in that time (while still play nice with boinc client of course :P). ID: 8925 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 8927 - Posted: 13 Jan 2006, 12:00:12 UTC Last modified: 13 Jan 2006, 12:00:41 UTC Ya'll may want to talk to Carl Christionsen at CPDN. He is working on the next generation of CPDN and they will be doing long work units, returning intermediate results, AND killing unproductive models mid-run as needed. Or, at least that is the plan Les mentions. From Les "Carl has said elsewhere that he is changing the way data is stored on computers, and also when it is sent. This already applies to spinup, and is, apparently, going to apply to coupled ocean. With spinup, the trickles are more frequent, and larger, than in slab/sulphur. They now contain science data, and there is also a BIG trickle at each 50 year point. These 'trickles' need to be sent back when they are created, both so that the researchers can keep on eye on models to make sure they are stable, and so that there isn't a HUGE lot of upload traffic at the end. And, at least with spinup, the researchers can send a 'kill' message to abort an unstable model before it wastes too much computer time. on the CPDN NC Forum Maybe they have some ideas that might help. Carl had been having an issue mentioned on the dev forum and I BCC an email to David Kim (though it seems that Carl found his troubles). Anyway, just a thought ... ID: 8927 · Rating: 0 · rate: / Reply Quote

SwZ Send message Joined: 1 Jan 06 Posts: 37 Credit: 169,775 RAC: 0	Message 8945 - Posted: 13 Jan 2006, 15:27:32 UTC - in response to Message 8904. currently the random number seed is being determined on your computers based on the time at which the process starts. however, there is a chance that the clock time is identical when two of your jobs start, and we are in fact seeing a low level (~5%) of duplication !!! Very strange so high probability! I using CPU clocktick in my programs for seed numbers and it increase about every nanosecond, so two process can not have identical clockticks samples. ID: 8945 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 9005 - Posted: 14 Jan 2006, 12:17:16 UTC Rather than selecting a number of structures at the outset, would it be possible to select a preferred cpu runtime at the outset. At the end of structure N, (where N does not include any structures that were terminated very early), the program calculates struct_cpu = cpu / N est_cpu = cpu + 1.5 * struct_cpu top_cpu = cpu + 3 * struct_cpu It stops if top_cpu is more than the enforced max cpu time for the result, meaning that we should see an end to the time exceeded problem. The safety margin of 3x the structures seen so far can be tweaked as needed, as can the enforced max time for the result. The intent is to stop cleanly at the end of a structure rather than be aborted during the forthcoming structure. It also stops if est_cpu is more than a preferred cpu time specified by the user. The factor of 1.5 instead of a factor of 1 menas that about as many will stop half a struct short as will stop half a struct over. The beauty of this is that it combines two or three different issues - the possibility of preventing overrrun, it achieves a more uniform run length when some structures have to be abandoned, and it gives users who want very long runs the chance to ask for them. Two new paramters would need to be passed into the result at startup - it would have to be told the preferred and max cpu times. The default for the preferred cpu time would be the predicted time for 10 structures) but in practive more than 10 woulkd sometimes be tried when there were some truncated structures. The max cpu time should be scaled to an appropriate excess over the preferred. In reporting %done the client would report cpu / est_cpu (using the est_cpu calculated at the end of the previous structure or the preferred cpu time during the first structure). In practice %done would jump up and down a little at each new structure, but would still be a lot more useful than at present. Jack & DavidK: I don't know your code to know if this is feasible, but if it is it might be a better way forward than trying to guesstimate sensible ranges for nstruct. ID: 9005 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 9009 - Posted: 14 Jan 2006, 15:00:16 UTC - in response to Message 9005. Last modified: 14 Jan 2006, 15:24:45 UTC Sorry, it is too late to edit, but est_cpu = cpu + 1.5 * struct_cpu should read est_cpu = cpu + 0.5 * struct_cpu of course. We press on if there is more than half a struct time left between now and the preferred run time so that we overrun by up to half a struct. AND a different estimate est_cpu_report = cpu + struct_cpu should be used later on when calculating & reporting the %complete R~~ ID: 9009 · Rating: 0 · rate: / Reply Quote