Finish a workunit in < 1 minute: What would it take?

Author	Message
Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29510 - Posted: 17 Oct 2006, 12:22:57 UTC workunit is really just the initial model with some settings[1], and then the client will create as many decoys, which if we use David Bakers model of "searching for the lowest point on a planet" are the measurment of a particular point. For each decoy rosetta will randomly move (twist or fold) the atoms around in the model, and then calculate the RMSD. After each decoy, a decision is made whether to start over from scratch with a different set of changes(it's getting worse and worse) or to continue with the previous result. The end-result of the model is the "lowest energy level for this workunit", and the positions for all the parts of the protein when it was at that lowest level. (That is, we report the best result from the whole sequence). [1]The settings will influence how long it takes to create a decoy by deciding how detailed each modeling step should be. There are shortcuts to calculate "roughly" which way to fold the protein, and get a closer model. So early attempts will be "rough" and then a more precise model will be produced when the rough results are back, and a "refinement" may be sent out as a third phase. -- Mats ID: 29510 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29536 - Posted: 17 Oct 2006, 19:57:22 UTC - in response to Message 29508. It sounds like the basic unit is the decoy, and work units are made up of decoys. Are there other work unit "ingredients?" How does Rosetta decide how many ingredients are in a single work unit? Does it limit work units by the number of ingredients, the number of bytes, or both? My inquiry is about how to complete an average work unit in under one minute using hardware upgrades and soft division. I am not interested in work units that would only take one minute on current hardware. I want to know what it would take to empty the WU queue. Work Unit (or now called tasks in newer BOINC terminology) is a bad one to choose since we select how long to do them Simply alter your local settings file to 30 secs and you'll probably be there, since we select our own workunit/task runtime length with lowest we can choose set to 1hr via the online preferences. So in reallity you'll never get <1min unless it errors out. But as you mentioned in the first part we should (you should be asking) an how to do an Average decoy in under 1 min. Again this is kind of hard since decoys vary greatly in lengh, sometime I can get 40decoys+ done in 3hrs, sometime just 1. So Average is only average for the task group (aka workunit group, target type, the collection of specific tasks we are looking at you can see and idea here https://boinc.bakerlab.org/rosetta/rah_results.php?login=1 of the target types. It should show only ones you have done, no idea how to show just the current). If mmcaistro, the man of many charts & graphs, has the patience to work it all out then that's your best bet. (P.S. Note our credit is actually based on the average decoy time taken for each target type) Team mauisun.org ID: 29536 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 29539 - Posted: 17 Oct 2006, 21:18:30 UTC As you have stated, I think that due to the variances in decoys/wus etc, that only Rosetta could install/change the program to record and report these values. The best that can be done with what we have is to get the average number of seconds/decoy. Well, we could mess with run times either manually or via web preferences to try to do just ONE wu and record it. That still wouldn't give an accurate representation of the time for ONE decoy since the program performs initialization as well, so that time would be included. Then you have the myriad of different systems and those individual performance issues. This ball belongs in the projects court. I'm not playing this game. LOL I can live with just the average. ID: 29539 · Rating: 0 · rate: / Reply Quote

Michael Send message Joined: 12 Oct 06 Posts: 16 Credit: 51,712 RAC: 0	Message 29540 - Posted: 17 Oct 2006, 21:37:35 UTC How many teraFLOPS would it take to push the queue down to zero? Michael Join Team Zenwalk ID: 29540 · Rating: 0 · rate: / Reply Quote

Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0	Message 29544 - Posted: 17 Oct 2006, 22:32:51 UTC Last modified: 17 Oct 2006, 22:34:28 UTC Unless one knows the outcome of the analysis beforehand, then one can't know the floating point operations needed for each model. Then you have to know how many models are in each wu type. How much work there is for each type submitted for analysis at every given second of each day. Also, the same for each type in the db each day. Basically, this question would have to be answered by the project scientists and they'd have to analyse it on an ongoing basis. ID: 29544 · Rating: 0 · rate: / Reply Quote

Michael Send message Joined: 12 Oct 06 Posts: 16 Credit: 51,712 RAC: 0	Message 29547 - Posted: 18 Oct 2006, 0:09:30 UTC Last modified: 18 Oct 2006, 0:19:12 UTC Maybe a 3.6 TeraFLOP cluster like the BladeCenter JS21 Cluster could push the queue toward zero (even though it is almost there: < 1000). When the queue was at 19,000 and the WU in progress was at 191,000 I thought that perhaps a 10% boost in TeraFLOPS (3.6 TeraFLOPS) would do the trick. Let's say I have a 3.6 TeraFLOP cluster, what kind of Internet connection would this cluster need? Michael Join Team Zenwalk ID: 29547 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 29548 - Posted: 18 Oct 2006, 0:24:00 UTC In order to push the work queue down, you have to consume work faster then the Rosetta server can generate it... to generate work, it simply has to assign a random number seed, and send the same 3-5MB of stuff it sent to everyone else working on that protein. So, collectively, we'd all have to crunch a work unit is less time then it takes for the server to compute the next seed value to assign... in short, you can't get there from here. I think the crux of your question is "how much computing power is enough?". As you notice, the Rosetta team is comprised of many individuals. We haven't seen work from each of them. The docking work is an example of where one of the Rosetta researchers has some ideas about how to go about studying the docking, and builds work units and Rosetta code to test there ideas. Then they will refine them over time. Same as we are still doing for the "old" work of protein folding. You will note in the R@H logo it mentions "Folding", "Design" and "Docking"... so we're still just 2 out of 3. As we in the user community bring more power to the project, the project can try out more new ideas. Some will be fruitful, others not. But this is how discovery works. In short, unless there is a problem on the server, you won't empty the queue... but Dr. Baker would probably love it if you have the computing power to make a go of it! Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 29548 · Rating: 0 · rate: / Reply Quote

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29569 - Posted: 18 Oct 2006, 11:19:45 UTC - in response to Message 29547. Maybe a 3.6 TeraFLOP cluster like the BladeCenter JS21 Cluster could push the queue toward zero (even though it is almost there: < 1000). When the queue was at 19,000 and the WU in progress was at 191,000 I thought that perhaps a 10% boost in TeraFLOPS (3.6 TeraFLOPS) would do the trick. Let's say I have a 3.6 TeraFLOP cluster, what kind of Internet connection would this cluster need? As feet1st explained, you can have new work-units produced at pretty good rate of knots - until the work on a particular model fulfills some preset criterie (say 100000 decoys), it will keep work coming for that model. Constructing a new model is a bit more work, but there will be LOTS of workunits for each model, so there's no risk that it runs down too quickly. It's probably also better for the project to have a longer calculation time than to run short runs of many workunits. Also note that you could, in theory, run one workunit as a single decoy generation, which wouldn't take very long. The runtime per workunit is decided by each user - and many workunits will contain the same starting model, but a different random seed (which determines the sequence of random numbers from there on). So for each workunit produced, the number of decoys produced in X hours varies from one to many - working more hours will create more decoys, which improves the prediction. Many computers working on the same workunit with different random number seeds will give many different predictions, and one of that huge number will be the best one - that's the one that really counts, but we had to try all the other ones to know which is the "best" one... This is different from SETI or Einstein, where a workunit is a part of some collected data that needs to be processed to look for some content. So there's a finite amount of data to process [although there may be sufficient for many years worth of work, and new work is being produced by the radio telescopes etc, so there's probably no risk of running out, short term - and if that was a problem, there's always the possibility of updating the search method to look more carefully!]. -- Mats ID: 29569 · Rating: 0 · rate: / Reply Quote

Michael Send message Joined: 12 Oct 06 Posts: 16 Credit: 51,712 RAC: 0	Message 29572 - Posted: 18 Oct 2006, 12:15:10 UTC Would it be beneficial to calculate a WU for twice as long as normal on purpose? Or would this cause duplicate results? Michael Join Team Zenwalk ID: 29572 · Rating: 0 · rate: / Reply Quote

Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0	Message 29577 - Posted: 18 Oct 2006, 13:57:52 UTC - in response to Message 29572. Would it be beneficial to calculate a WU for twice as long as normal on purpose? Or would this cause duplicate results? Calculating for twice as long will generate (roughly) twice the number of decoys, and unless the random number generator is poor [and/or the code in Rosetta that makes use of it], the results should all vary to some extent. There may well be code in there saying "Have we been here before, if so go some other direction", which would GUARANTEE that your decoys are all unique. Of course, two computers may end up with the same result coming from different directions (all work-units have different random seeds, but there's nothing in and of itself preventing one machine from generating the exact same decoy as another, but it won't have got there the same route). Since each user (host-owner) can decide what work-time to spend on the units, the folks at bakerlab hasn't got much input in how long each work-unit worked on - but you do. Longer isn't better, other than the fact that the number of downloads/uploads from/to the server will be reduced, and some of the setup time to configure the workunit is only done ONCE per work-unit, no matter how long it runs... But the credit per hour is very close to the same... -- Mats ID: 29577 · Rating: 0 · rate: / Reply Quote

Michael Send message Joined: 12 Oct 06 Posts: 16 Credit: 51,712 RAC: 0	Message 29578 - Posted: 18 Oct 2006, 14:07:30 UTC I'm out of questions. Michael Join Team Zenwalk ID: 29578 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29628 - Posted: 19 Oct 2006, 9:57:33 UTC - in response to Message 29578. Last modified: 19 Oct 2006, 9:57:54 UTC I'm out of questions. We where just getting going as well :-D Team mauisun.org ID: 29628 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29638 - Posted: 19 Oct 2006, 14:55:51 UTC - in response to Message 29536. Work Unit (or now called tasks in newer BOINC terminology) No, a WU is different from a task. On Rosetta, WU consists of all tasks sent out with the same data and the same random number seed. Usually that is just a single task, so we get sloppy in how we use the term. But when a user aborts an unstarted / unfinished task, or where the result comes back with something that is regarded by the project as an error, then a second task is sent out from the same WU. At one time this project would allow up to 16 retries, but after a nasty experience with some death-WU that just wasted people's time the decision was taken to limit the number of attempts to 2. The reason "task" is new terminology is that they used to be called "results" even before they had run. Now what is sent out from the server is a task and what is sent back is a result. The task / result is identified by a single resultid which is different if an identical task is sent out from the same WU. Both tasks would be associated with the same wuid. Here is a a WU with two tasks. At the time of posting this the second task had been generated by the scheduler but not sent out. ID: 29638 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29641 - Posted: 19 Oct 2006, 15:31:48 UTC - in response to Message 29638. Last modified: 19 Oct 2006, 15:39:22 UTC Work Unit (or now called tasks in newer BOINC terminology) No, a WU is different from a task. On Rosetta, WU consists of all tasks sent out with the same data and the same random number seed. Usually that is just a single task, so we get sloppy in how we use the term. But when a user aborts an unstarted / unfinished task, or where the result comes back with something that is regarded by the project as an error, then a second task is sent out from the same WU. At one time this project would allow up to 16 retries, but after a nasty experience with some death-WU that just wasted people's time the decision was taken to limit the number of attempts to 2. The reason "task" is new terminology is that they used to be called "results" even before they had run. Now what is sent out from the server is a task and what is sent back is a result. The task / result is identified by a single resultid which is different if an identical task is sent out from the same WU. Both tasks would be associated with the same wuid. Here is a a WU with two tasks. At the time of posting this the second task had been generated by the scheduler but not sent out. Third time at trying to type this (stupid trackpad zones) Actually it down to the fact the server side uses the old naming and the client side uses the new naming.. Task = BoincManager (users) what we see. Work = server naming Hence a Work Unit should be called a Task Unit under the new naming scheme similar inconsitencies are credits Credit = server naming where it shoudl actually be called 'work done' now as in Boinc manager Most of the name changing was driven by the BBC's usabililty survey though it's now being driven by WCG. But as WCG do not use the server interface to the users I doubt it a priority to standardise the naming across all components EDIT to add, all this happend in 5.3 series Team mauisun.org ID: 29641 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29642 - Posted: 19 Oct 2006, 16:02:25 UTC - in response to Message 29641. ... Task = BoincManager (users) what we see. Work = server naming Hence a Work Unit should be called a Task Unit under the new naming scheme ... Doh!! I thought I'd understood the change of terms, but now I am not so sure. On this page which I linked to before the page as a whole described a WU in old terminology, and each row in the table described a result in old terminology. In newspeak, is it 1) a task unit with two tasks? 2) a task unit with two results? 3) a work unit with two tasks? The distinction "what we see / what the user sees" is not actually a helpful one, as a user only ever sees one result from a given WU (in oldspeak) therefore there always were endless confusions between the row and the table. This is not just pedantry - it matters as soon as participants start to discuss workloads on a project scale. For example on LHC postings about workloads quite often confuse 1000 WU =(?!) 1000 results, whereas on that project 1000 WU = 5000 results in terms of how much work is released to users. And because participants migrate from project to project it is helpful for everyone to be using the same terms. And embarrassing to realise it might be me adding to the confusion... :-( R~~ ID: 29642 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29652 - Posted: 19 Oct 2006, 19:54:10 UTC - in response to Message 29642. And embarrassing to realise it might be me adding to the confusion... :-( R~~ Not really you, it all BOINC's faults :-D I did ask it as a question to ROM's Q&A (If you don't follow his weekly Q&A you should be, it's at his personal BLOG), stangly it's never been answered. I might have to bring it up again :-) Team mauisun.org ID: 29652 · Rating: 0 · rate: / Reply Quote

Michael Send message Joined: 12 Oct 06 Posts: 16 Credit: 51,712 RAC: 0	Message 29737 - Posted: 21 Oct 2006, 8:50:57 UTC Which processor offers the largest L2 cache? L3? Allow me to respectively suggest the Next Gen Opteron with 2MB L2, and the Xeon with 16MB of on die L3. Besides my first question, which type and amount of cache would work best for Rosetta? Michael Join Team Zenwalk ID: 29737 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29757 - Posted: 21 Oct 2006, 14:09:55 UTC - in response to Message 29737. Which processor offers the largest L2 cache? L3? Allow me to respectively suggest the Next Gen Opteron with 2MB L2, and the Xeon with 16MB of on die L3. Besides my first question, which type and amount of cache would work best for Rosetta? In general (not knowing the answer for the current Rosetta app) I'd go for the best on-die cache. And given that Rosetta is constantly developing new software and testing larger more advanced algorithms, I'd say that is the way to go whatever the answer would be for today's app. This time next year we will be running algorithms that David David and Jack haven't thought of yet, and none of us know for sure which cpu will be best. As a very very rough rule of thumb (correct me on this hardware tekkies) I'd reckon doubling the on-die cache is worth ~ 10% to ~ 30% on the clock speed. Larger on-die caches make more difference with more cores too, as they cut down the times when the two cores both want the RAM at once. R~~ ID: 29757 · Rating: 0 · rate: / Reply Quote

River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0	Message 29759 - Posted: 21 Oct 2006, 14:25:21 UTC - in response to Message 29475. Last modified: 21 Oct 2006, 14:27:49 UTC I have a hyperthreaded CPU and Rosetta automatically runs two jobs, one on each "processor". Couldn't I simply request 32,000 jobs if I had the Roadrunner? yeah - ... er, no. The quota on Rosetta is 100 WU /day /cpu but the catch is that the algorithm treats 4 as the max possible number of cpus. One less than a rabbit, for Watership Down fans ;-) This is intended to make sure that a bug (or a deliberate attack from a client_state manipulator) cannot suck too many WU out of the databse. So the Roadrunner would only have 400 of its cores active, and when those tasks were returned would need to wait till tomorrow. Presumably by the time boxes like that become available to typical BOINCers, the BOINC folk will have tweaked the code a little. Until they do, running BOINC would not be the best use of that hardware! R~~ ID: 29759 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 29763 - Posted: 21 Oct 2006, 14:48:59 UTC - in response to Message 29759. I have a hyperthreaded CPU and Rosetta automatically runs two jobs, one on each "processor". Couldn't I simply request 32,000 jobs if I had the Roadrunner? yeah - ... er, no. The quota on Rosetta is 100 WU /day /cpu but the catch is that the algorithm treats 4 as the max possible number of cpus. One less than a rabbit, for Watership Down fans ;-) This is intended to make sure that a bug (or a deliberate attack from a client_state manipulator) cannot suck too many WU out of the databse. So the Roadrunner would only have 400 of its cores active, and when those tasks were returned would need to wait till tomorrow. Presumably by the time boxes like that become available to typical BOINCers, the BOINC folk will have tweaked the code a little. Until they do, running BOINC would not be the best use of that hardware! R~~ That can be overcome, just add a CPU selection criteria in the databsae like they do for RAM and alter the Max WU/day accoringly. Team mauisun.org ID: 29763 · Rating: 0 · rate: / Reply Quote