Dr. Baker's journal archive 2006

Author	Message
David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16427 - Posted: 17 May 2006, 4:32:09 UTC Rhiju and I had fun looking at the lowest energy structures returned for T283 thus far. They are very similar in the first two thirds of the protein, but for the last third we see several distinct solutions. Based on our experience with the test problems over the past three months, we expect that with more sampling one of the solutions will clearly win out, and this should be (we hope!) the correct structure. Thus far we have about 100,000 structures returned; we hope to have 10x more sampling before submitting predictions. We made another step forward today in reducing the rosetta memory footprint. For a 175 amino acid residue protein, the standard ab relax protocol we are using for CASP took 222MB of virtual memory three weeks ago, and is now down to 108MB! Now a computer with only 256MB of memory should be able to comfortable process rosetta@home jobs even for larger proteins. The major memory hog now is the boinc graphics which can add on another 100MB or more--any experts out there who might be able to help with this? In any event, you should be able to run rosetta@home on low memory machines as long as you turn the graphics off. A side benefit to some of the memory use reductions is that it should be relatively easy to reduce the sizes of some of the input files we send out with each work unit. Would a 30% reduction make a significant difference to dialup users? Seven targets have been released thus far in CASP7. The list, which is updated daily, is at http://predictioncenter.gc.ucdavis.edu/casp7/targets/cgi/casp7-view.cgi?loc=predictioncenter.org;page=casp7/. ID: 16427 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16517 - Posted: 18 May 2006, 6:19:44 UTC I was asked how it was possible to reduce the memory usage on rosetta@home so dramatically. The answer has to do with the extremely rapid pace at which the rosetta code base is evolving and the very large number of developers. I have encouraged all of the researchers leaving my group for faculty positions at other universities to continue working with and developing rosetta, and now there are six research groups in addition to mine actively developing the code. We share all of our advances through a common code repository (SVN) which developers in all of the groups are encouraged to incorporate their changes/improvements into. On a typical day, there can be as many as eight different people commiting their changes into the repository. Nighly automatic benchmarks are run on a wide variety of test problems which cover the wide range of applications currently being pursued with rosetta (you can read about some of them in the links from the home page). As you can imagine, the results of these benchmarks are scrutinized carefully, and if there is anything amiss a flurry of emails goes around all the groups until any problems are resolved. You could imagine a more conservative approach to code evolution, but my philosophy is that there are so many important hard problems to solve in biology that we all benefit from incorporating all advances as soon as possible. Now, because of all the new areas being pursued in the different groups, and the very large number of developers, the code base is constantly growing. This is ocurring even as we try to make rosetta as suitable as possible for distributed computing. Up until recently, as you are all too aware, there were a number of problems with the rosetta-boinc interaction and with distributed computing with rosetta generally which occupied all of our efforts. Due to the work of Rhiju, David K., Bin and Rom, these problems have been largely solved. This has given us time to try to make rosetta even better for distributed computing--because the problems we are trying to solve are so big, we hope to ultimately reach the size of seti@home. Early on, users told us the memory footprint was a significant problem. We didn't have time to deal with this with all the fires we were trying to put out until recently. I had time over the past three weeks to compile and pore over a list of all the arrays in rosetta that are larger than 1Mb. With help from quite a number of developers, we systematically went through the list, starting at the top, and tried to reduce each as much as possible. Many of the arrays were mode specific, and could be dynamically allocated only when needed, and others could be replaced by more efficient containers. It was actually kind of fun; every few days we had cut the memory use down by a significant percentage. It can't go all that much lower, but as I mentioned yesterday, we can now cut down the size of the largest datafile we currently send out with each work unit. I hope the reduction in memory use are helping some of you; they already have made it possible for us to efficiently utilize blue gene processors--this was not possible a month ago because of the small amount of memory associated with each processor. ID: 16517 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16611 - Posted: 19 May 2006, 4:24:21 UTC Good news for dial up users--starting tomorrow, the data files sent out with each work unit will be MUCH smaller. David Kim found for the work unit t287_HOMOLOG_ABRELAX_hom001_ the input files have dropped in size from 6.6MB to 2.3MB. I hope this helps!! For people who would like to learn more about our research, but don't want to deal with the umm filled long video, there is an article which is basically a transcript of a talk I gave at the royal society in london last year at http://depts.washington.edu/bakerpg/: click on "publications" and then on "2006"; it is called "Prediction and design of macromolecular structures and interactions" Also, Divya has fixed the silly problem with the text on the screensaver; the workunits going out tomorrow will have this fixed. ID: 16611 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16714 - Posted: 20 May 2006, 16:47:43 UTC In response to some of the recent discussions on the science boards, today I'd like to tell you about how Rosetta is being used to help understand diseases caused by protein misfolding. A significant fraction of human diseases are caused by proteins misfolding to form long "amyloid fibrils". These diseases range from Alzheimer's disease to infectious diseases from amyloid forming prion proteins. A huge breakthrough in the understanding of the process of protein misfolding to form amyloid fibrils was published in Nature last year from David Eisenberg's research group at UCLA. They reported the first high resolution structure of an amyloid forming peptide. It revealed a set of interactions which seem very likely to be general to most if not all amyloid structures. We have been collaborating with Eisenberg's group to try to predict the portions of proteins known to form amyloid structures responsible for amyloid fiber formation. We use the rosetta-design method to identify sequences compatible with a generalized model of their amyloid structure. You can read about the promising results of this work in the collaborative paper with Eisenberg's group that is posted on the "2006" portion of our home page publication list mentioned in my previous post. The next challenge which we are collaborating on is to design "caps" that will add on to fibers and prevent them from growing further. This is a good example of how basic research development can have applications to pressing medical problems that were entirely unanticipated. On a different note, Rhiju asked me to request that users with computers with remaining problems sign up for ralph@home--the error rate is significantly lower on ralph perhaps because there are a larger fraction of high end machines, and this makes it harder to track down the remaining issues on rosetta@home. ID: 16714 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16814 - Posted: 22 May 2006, 5:49:54 UTC Donna from the AP sent me a draft of her article; it is great! Thanks to all who helped her with it! She says the AP reaches half a billion people each day--hard to beat that for publicity! On the CASP front, the lowest energy structures for Target 283 are quite similar to one another, which makes us very excited as this kind of convergence in our tests the last few months has been a pretty good indicator that predictions are correct. We will submit these lowest energy structures for T283 this week, and then focus on the harder problems presented by T287 and T285, and the new targets likely to be released in the next few days. ID: 16814 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16956 - Posted: 24 May 2006, 6:00:10 UTC A quick update on CASP7: Many of the targets are very closely related to proteins of already known structure; in fact I'm not sure why the experimentalists bothered to determine their structures! The search is pretty easy in this case, and we are not putting too much effort into these predictioins (they are not so exciting). There are four or five targets which do not appear to be related to any protein of known structure. For two of these we feel confident that we are zeroing in on the correct structure (of course we won't know for sure for a few months!). But target T296 released today was quite humbling--it has 445 amino acids! This is a dramatically bigger search problem then any we have done tests on, and it may be more a problem for the rosetta@home of next year than this year. but we are going to give it our best shot! It has been wonderful to see the compute power increase over the past weeks. rosetta@home according to boincstats is now above 31Tflops. We hope this continues and we will do everything we can to make this possible. If it does continue, solving problems like T296 will move more and more into the range of possibility. ID: 16956 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 17292 - Posted: 29 May 2006, 6:23:07 UTC I wrote an internal benchmark for Rosetta last week, and Rom now has a version that uses this to compute credits. Rom suggests however that we wait until after CASP to deploy it because it may take a few iterations to make it acceptable to everybody. I don't know how difficult it will be to "get it right", but I'd like to start testing it on Ralph soon. The new version soon to appear on ralph will also have a fix Rom put in for graphics problems; as reported on the boards, a good fraction of the errors seem to be associated with the graphics (I suspect the fact that they consume lots of memory is part of the problem), and in the new versions graphics related errors should abort the graphics but not disrupt completion of the Rosetta calculation. ID: 17292 · Rating: 2 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 17335 - Posted: 30 May 2006, 4:52:53 UTC The AP article on rosetta@home is out! See Ethan's post on the boards today. I think it turned out very well--what do you think? Lets hope lots of people see it. ID: 17335 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 17497 - Posted: 1 Jun 2006, 5:40:43 UTC Here is a press release from the scientific journal Nature on an article of ours that is appearing in the June 1 issue. I'll explain a bit more about the applications of this new Rosetta methodology in future posts; these jobs should start running on rosetta@home after CASP is completed in early august. Featured press release entry: Protein engineering: OK Computer (pp 656-659) One of the great remaining problems in computational protein design involves the redesign of a DNA-modifying protein so that it recognizes, and alters, a new DNA sequence. For example, changing the specificity of a nuclease � a protein that cuts DNA at a specific site � could be beneficial for a range of biotechnological and medical applications. In this week�s Nature, David Baker and colleagues have shown that it is possible to modify the sequence specificity of a �homing endonuclease� called I-MsoI. They used a computational approach to screen a virtual library of mutant proteins and predicted which amino acids needed to be changed to re-engineer this enzyme so that it recognized, and cleaved, a new DNA sequence. The mutant protein was highly active and was able to cleave the new DNA sequence, but did not modify the original sequence. The authors hope to redesign this and other DNA-modifying enzymes to alter a range of DNA sequences, so that they could specifically target almost any sequence in the genome. These computationally designed proteins may be useful in a range of medical and biotechnological applications, including gene therapeutic and other targeted genomics applications. ID: 17497 · Rating: 3 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 17566 - Posted: 3 Jun 2006, 6:06:13 UTC CASP7 is really heating up! You can see the list of targets at http://predictioncenter.gc.ucdavis.edu/casp7/targets/cgi/casp7-view.cgi?loc=predictioncenter.org;page=casp7/ We made submissions for the first two that were due yesterday (targets 284 and 287). if you have 287 work units still remaining on your computer you can delete them. Please keep all others running! CASP7 is turning out to be an even more extensive test of rosetta@home than we expected! A much larger fraction of the proteins than we expected based on previous CASPS are both relatively small and completely unrelated to any protein of known structure. These targets are perfect for the methodology we have been developing at rosetta@home since last september when the project began. Things are exciting now, but imagine what it will be like in a couple of months when the true structures are released and rosetta developers, rosetta@home participants, and the whole world can see how good (hopefully!) the predictions are. We will resume our user feedback by acknowledging the users who find the lowest energy several structures for each of the targets on the home page. (we can't show structure comparisons as on the "top predictions" page because we don't know the true structure!). ID: 17566 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 18014 - Posted: 7 Jun 2006, 22:01:22 UTC Welcome to all of our new participants! I have only very sporadic internet access as I'm out of town this next week, but I look forward to interacting with all of you here when I return. I was absolutely delighted to see the large increases of the last few days; they will really help accomplish the goals of the project! Thanks again to all of you, David ID: 18014 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 18761 - Posted: 16 Jun 2006, 4:19:17 UTC I was back at work today and Rhiju and Bin showed me the current sets of CASP7 targets they are working on. Most of them are much bigger than the proteins we tested on during the spring, and almost certainly require more computer time. So please recruit all of your friends and relations for the next month and a half--we are going to need every spare cycle! ID: 18761 · Rating: 0.99999999999999 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 18854 - Posted: 17 Jun 2006, 14:52:08 UTC Rosetta@Home is now resuming feedback on "top predictions". Every few days we will be acknowledging the person who found the lowest energy structure for one of the CASP targets that has run thus far. In most cases, we have tested several different approaches, which have different work unit names, and in these cases we will be highlighting the person who found the lowest energy structure for each approach. So keep a lookout for your name in the limelight! I think this is a nice addition to the credit system for following contributions as anybody can win, big or small; like a lottery if you buy more tickets you have a better chance, but the small guy can still have the magic entry. Are there other suggestions for feedback we could give? Certificates, etc. we could think about if people would like this, but we would certainly need this to be at least in part handled by a volunteer group as we are swamped with CASP. ID: 18854 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 19061 - Posted: 21 Jun 2006, 14:40:59 UTC CASP targets are continuing to come in, and we have more to do than ever--our "CASP control room" with Rhiju, Bin and others furiously going through targets, pieces of paper with various information on each of the targets floating around, etc. is quite a sight! The structures of a few of the targets have been published now, but these are all in the "comparative modeling" category where copying a known structure gives a good solution already (we are trying to refine these starting models using the high resolution part of the the protocol running on rosetta@home). Calculations for these proteins didn't use rosetta@home as they are less time consuming. Our results are good compared to the automatic servers, but we won't know how they stack up compared to other participants until the meeting in November. We did get some exciting news yesterday from an analogous prediction experiment/competition called CAPRI on protein-protein docking. For this problem, which consists of finding the lowest energy docked arrangement of two protein structures give the coordinates of the isolated proteins, our approach is very similar to that running on rosetta@home--there is an initial low resolution search followed by full atom refinement. Chu Wang, a graduate student in the group, made predictions for the most recent round of CAPRI, and they turn out to be the best made by any group: http://capri.ebi.ac.uk/round10/R10_T26/ (scroll down to "medium predictions"; we are group 80). Finally, to answer a question on the discussion boards, many proteins consist of multiple independently folded "domains". In many cases, it is possible to recognize from the amino acid sequence roughly where the boundaries between the domains are, and in these cases we carry out folding calculations separately on each domain. This in the end produces models for different parts of an amino acid sequence, and we then need to assemble these into one coherenet structure. For this we use a protocol again very similar to what you have been running, except that the only variation allowed is in the linker between the domains, typically around 10 residues, while the intradomain structure is kept fixed (this is quite analogous to the docking problem I mentioned above). ID: 19061 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 19416 - Posted: 28 Jun 2006, 14:12:31 UTC CASP targets are continuing to come in and we have our hands totally full. We have less than a day for each target. This compares with the well over a year it takes to solve a structure using xray crystallography or NMR, often with considerable application of human intuition. perhaps predictions would be closer in quality to experimentally determined structures if there was less of a difference in time investment! This question came up on the number crunching boards: With the new methodologies being developed, will there be a point at which we go beyond the needle-in-the-haystack decoys and start clustering around the actual structures of protiens? Answer: correct models will always be a very small fraction of the structures generated just because there are so many alternative conformations for a protein chain. but to have confidence in a prediction, there must be convergence of the lowest energy conformations on a single structure. As our methods improve and sampling (cpu power) increases, correct models will remain "needle in a haystack" in the overall population, but dominant in the population of lowest energy models. And is this the goal before (from what I understand) the project moves into the design/docking phase? Answer: No, while this is the solution to the structure prediction problem, it is not necessary for successful design and docking (certainly, though, more accurate prediction methods would impact both areas). We have had considerable success with both design and docking already. After CASP we will start running both docking and design calculations on rosetta@home, as well as continuiing to improve our structure prediction methods. ID: 19416 · Rating: 0.99999999999999 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 19455 - Posted: 29 Jun 2006, 6:51:38 UTC Today I met with the people who design the science curriculum for Seattle Public School middle and high schools to discuss incorporating rosetta@home into middle and high school science classes. I think that participating in a real research project could be more inspiring than just learning a set of facts; I certainly never found science classes very fun or interesting--the exciting part is discovering new things more than learning about discoveries made long ago. Anyway, they were very interested and we should have some pilot projects in schools this fall. These message boards were what gave me the idea for this--it has been really fun and rewarding to try to explain our research and answer all of your questions. As part of making the project more educational, we are working, with help from a Microsoft expert, to increase the amount of feedback participants can get on the results their computer produces. Hopefully you will see this here in not too long. ID: 19455 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 19869 - Posted: 7 Jul 2006, 7:06:04 UTC David Kim just put a link on the home page to an article just out in "The Scientist" which describes my group's research work and features rosetta@home. If you are interested what is coming down the road for rosetta@home you might take a look at it. The last CASP targets are going to be released in a couple of weeks; it has been so much work that we are all ready for CASP to be over so we can start pursuing the new ideas that have come up as we work on these concrete problems. Also, of course, we are very eager to see the actual structures, and learn what we need to work on most to improve Rosetta. ID: 19869 · Rating: 1 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 20187 - Posted: 14 Jul 2006, 15:45:46 UTC We desperately need as much CPU power as possible for the next two weeks--there are more than 25 CASP targets due, including some that are our best shots at really high resolution models. Frustratingly, we won't be able to do anywhere near as much sampling as we had planned for these proteins as there are so many coming due near the same time, and thus can't really expect the accuracy we had hoped for. So if it is at all possible for you to increase your rosetta@home cpu time for the next two weeks please do--it will make a huge difference for our collective efforts! ID: 20187 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 20573 - Posted: 19 Jul 2006, 4:57:38 UTC Several things tonight: (1) A reporter is doing a story on rosetta@home participants. please if you are interested respond to her at Message boards : Cafe Rosetta : Reporter ISO of Interviewees. (2) The first actual structure for an ab initio casp7 target was released today. our top prediction is very close, but not perfect. the error is a shift in register of the last beta strand. this is a problem that we saw in a number of cases in the tests we ran in the spring, and will be high up on the list of methods improvements to be tackled in August when casp is over. (3) David Kim has put together instructions on how to save and view the predictions your computer is making. I think that many of you will find this very interesting--give it a try! (4) Thank you all for your response to our plea for more computing power--I think we are seeing an increase even in the face of the summer heat. (5) Please contact Jose on the message boards if you would like to know what is being done about high credit claims. ID: 20573 · Rating: 0.99999999999999 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 21367 - Posted: 29 Jul 2006, 5:40:58 UTC I just returned from our annual Rosetta developers meeting. It was a tremendous success! There were 70 people attending from all over the country and some came all the way from Europe to the conference center in the Washington Cascades. We discussed improvements in the basic methodology in Rosetta made in the different research groups, and some of the exciting scientific advances as well. It was a terrific opportunity to get caught up and to learn about all the new capabilities created by the extended Rosetta developers community. We also discussed how to continue to keep the program intact and cohesive with all of the changes being made in so many different places all of the time (those of you who are programmers will certainly appreciate this challenge). The meeting was also a great opportunity for beginning students in the different research groups from different institutions to meet each other and the people who have been working on developing rosetta for several years. And for those of us who have been around for a bit longer, it was a great opportunity to see old friends! My only disappointment was that I had to skip the traditionall hike/climb following the meeting because of a knee still not fully recoved from a previous trip. ID: 21367 · Rating: 0 · rate: /