We have a series of threads over time where you can participate with your questions and comments. Look for a thread in the Science forum with a sticky entitled "DISCUSSION of Rosetta@home Journal". At present, discussion 4 is the active discussion thread.
____________ Rosetta Informational Moderator: Mod.Zilla
ID: 26522 | Rating: 2 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The presentations made by the CASP8 assessors are now available at http://predictioncenter.org/casp8/docs.cgi?view=presentations.
We apologize for the outages over the past several weeks. The very good news is that Mike Tyka, Andrew Leaver-Fay and David Kim in an all out effort the past two weeks have identified the causes of many of the remaining errors some of you have been getting with the new rosetta code, and we hope that the error rate will now be much lower. I am very excited to see the results of the next weeks of calculations on rosetta@home--we are investigating some very fundamental issues. I'll give you a full report when your results are back in and we have had a chance to think about their implications.
____________
ID: 58751 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We are very excited--the new improved mini Rosetta code should now be running on your computers, hopefully very smoothly, and we have really exciting fundamental science questions that you will be helping us tackle this week. HIgh on the list are the comparison of Rosetta low energy structures to native structures I mentioned recently (might the Rosetta models in some cases be more accurate??), and a comprehensive test of new comparative modeling methodologies we have developed after CASP8 last summer based on what we learned from your CASP8 results. We have developed several different approaches which use the information from already solved related structures to different extents, and we are excited to learn how the new methods compare to our CASP8 approach and which of the new methods is the best.
____________
ID: 59031 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Mike has worked wonders! Here is his summary of what he and others have done to make mini Rosetta run smoothly on your computers:
Hello All!
We're ready for a new update. I want to say thank all of you who have helped over the last months to find and fix errors in minirosetta. A particular thank you goes to those who have donated their time over on RALPH and helped with their active feedback - we managed to find a number of difficult and rare bugs and put some new features into minirosetta that should help conserve computer time. Read about it here: http://ralph.bakerlab.org/forum_thread.php?id=431
and here http://ralph.bakerlab.org/forum_thread.php?id=432
I should add that work over there will continue,but now supplemented with information from Rosetta@HOME.
This update is highly focused on bugfixing and stability issues - we have virtually no new science in it, but: We will hopefully now be able to run the science projects that have been in the pipeline waiting for BOINC - we're expecting quite a bit of work to go out very soon indeed. See Dr. Baker's journal for more details.
Features/Fixes:
1.54 Release CHANGELOG
Faster loop closing in FoldCST/Abinitio (affects cc_* cc2_* cs_* WUs), should help with overrunning WUs.
Bug fix concerning intermittent crashes in relax benchmark jobs (_rlbd_) jobs - caused by buggy input file reader.
Bug fix for a potential instability in handling text files (affects all types of WUs).
Bug fix in checkpointing machinery, states were not being correctly restored, probably contributing to long runtimes. (affects cc_* cc2_* cs_* WUs)
Increased the density of checkpoints to lose less time on restarts and address the weired "backjumping" of the time reported in this thread. This will still happen, but the jumps should be much smaller (basically maximally as long as the time between checkpoints.)
Added checkpointing to Loopclosing part of FoldCST. (affects cc_* cc2_* cs_* WUs)
Added checkpointing to Looprelax.
The Watchdog has been checked and improved, now returning information on the aborted jobs to help us figure out how the remaining long running models come about. The watchdog will now abort if the runtime exceeds your preferred runtime + 4 hours. In other words the WUs should not overrun for more than around 4 hours. If they do please let us know !!
Added a limit ont he number of decoys per WU: 99. The WU will end gracefully after that and give full credit. This should address issues with excessive upload problems.
Fixed a bug in the BOINC API concerned with unzipping the input data. (I will let the BOINC guys know about this)
Fixed a strange problem in the options system leading to early crashes on some systems.
Two nasty instabilities fixed deep in the FoldConstraints/abinitio protocol (cc_* tasks and other homology modelling tasks)
Generally implemented much better error reporting - many many potential problems will now show up a meaningful error messages and not random segmentation faults.
NOTE: This new version contains a lot of debug output still. YOu will see that the stderr fills up with stuff - that is ok . It does not slow down the program nor cause much extra upload - but it tells us a lot about where things can go wrong still.
Despite all these fixes there are, i'm sure, many problems left. Most of them occur extremely rarely now though or are highly specific to particular machines. Thus we have decided to move the current version over from RALPH to Rosetta@HOME and give it a go on a much larger scale. Our effords to keep the failure rate down will continue and your time donations over on RALPH as well as error reports are still highly appreciated.
Please let us know how things work out there. Particularily i'd like to know about
Stuck workunits
Overrunning workunits (WUs should now, due to the new watchdog, never run more than 4 hours longer than the preferred user time)
Problems with checkpointing.
Any other strange behaviour.
Happy crunching - I'm very excited to see how this new version will pan out.
Mike
____________
ID: 59045 | Rating: 0 | rate: /
____________
ID: 59083 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Following up on my last post, here is an email I was very happy to get today!:
Wow!! I don't think I've ever seen a new release go for 24hrs without a single problem reported in the "problems with..." thread. Congratulations, what a great turn around!
Not only no problems reported, but not even a "...but what about the problem I had, that noone else ever saw before or since?" post.
3 cheers for the R@h team! Kudos all 'round.
100 TFLOPS here we come!!!
____________
ID: 59084 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The wonderful work that all of you are doing is keeping me very busy! We are now writing up manuscripts describing the many exciting results that you have obtained. In my next few posts I'll give you a brief overview of each of these.
The first manuscript is called "Simultaneous prediction of protein folding and docking at high resolution". It reports your exciting results with the "fold and dock" runs set up by Rhiju and Ingemar several months ago. Here is the abstract:
Despite recent successes in high resolution de novo modeling, computational methods have yet to achieve blind predictions of proteins in their most commonly occurring functional forms, symmetric homomultimers. Building on the Rosetta framework, we present a general method to simultaneously model the folding and docking arrangements of multiple chains. A benchmark study on large alpha-helical bundles, interlocking beta sandwiches, and interleaved alpha/beta motifs demonstrates the method’s generality, near-atomic accuracy, and potential use in molecular replacement phasing. Further, we present blind tests on a crystallized coiled-coil as well as two dimers with more complex geometries solved by NMR. These results indicate that high-resolution modeling of multimers is within the reach of the structure prediction community and may have immediate practical use for crystallographic phasing and the rapid structure determination of multimers with limited NMR information.
____________
ID: 60010 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The second manuscript I am working on is titled "Alteration of Enzyme Specificity by Computational Loop Remodeling and Design". It describes a new approach using Rosetta to redesign enzymes in the human body to catalyze new reactions. Graduate student Paul Murphy not only developed the new method, but also in the manuscript shows how it can be used to create a new enzyme for gene therapy.
Here is the idea. Suppose a patient needs a transfusion of cells from another person to recover from a disease. There is a small chance that these cells will, rather than helping you, actually cause some new problem. In this case, it is important to be able to selectively kill these cells. For this purpose, special drugs have been developed which are not themselves toxic, but become toxic when broken down by a particular enzyme. If this enzyme is put into the introduced cells, then they can be killed if necessary by giving the drug to the patient.
The problem is--where does this enzyme come from. If it is a human enzyme, then the patient will convert the drug to the toxic compound in his/her normal tissue which would be very bad. If it is an enzyme from bacteria for example, that humans don't have the patient will be safe from the drug. However, our bodies are made to destroy anything that looks foreign, and this bacterial protein certainly will.
Our solution is to take a human enzyme, and keep the outside the same, so the patient's immune system thinks it is a human protein and doesn't destroy it, but change the catalytic site on the inside so that it converts the drug to the toxic compound. In the manuscript, Paul shows how a human enzyme that deaminates guanine can be redesigned by remodeling not only the sidechains but also the protein backbone to deaminate cytosine. While not quite ready for prime time, with some optimization this enzyme could be used in gene therapy as described above.
In this case I don't have much more work to do--the manuscript is already accepted for publication in the Proceedings of the National Academy of Sciences, but it is a bit over their length limit so we have to figure out how to cut out a few words.
____________
ID: 60012 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The third manuscript I am working on is called "Blind docking of pharmaceutically relevant compounds using RosettaLigand". This describes the current state of our efforts towards improving methods for designing new small molecule drugs to combat diseases. The challenge addressed by this paper is to predict how drug molecules bind to proteins--determining how they bind is necessary to improve their efficacy and to identify good candidate molecules in the first place. In two earlier papers we had described the incorporation of small molecule modeling into Rosetta to create the new RosettaLigand drug docking methods, and tested the method on publicly available data sets. However, many of you will not be surprised that much of current drug discovery is being carried out in pharmeceutical companies, and the compounds they are testing are generally not made public (they are akin to trade secrets). We wanted to learn whether RosettaLigand was successful in docking drug compounds in actual drug discovery efforts, and were very fortunate when a large drug company agreed to let Ian Davis, who is working on the project, visit one of their company sites and analyze the results they had gotten with RosettaLigand on their private set of compounds. The results are described in this paper and are very encouraging, not only does RosettaLigand appear to be one of the best current methods for this crucial step in drug design, but also there are some clear avenues for improvement (which weren't evident with the public datasets) which we are now starting to work on.
____________
ID: 60027 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The fourth manuscript describes rosetta@home predictions in CASP8. as you know with your great help we were again the top ranked group in CASP this year. The manuscript begins:
The CASP8 experiment provided an invaluable opportunity to extensively benchmark the Rosetta protein structure prediction method on a wide range of comparative modeling and free modeling targets. For the targets for which a sequence-detectable structural template existed, target-template sequence alignments were generated, and the Rosetta rebuild and refine protocol was used to generate low energy models. For the small number of targets for which a reliable template could not be identified modeling was carried out using the Rosetta de novo modeling protocol. All targets were subjected to extensive high resolution refinement with the physically realistic Rosetta all-atom forcefield using Rosetta@home.
this one has a due date--March 15--so we have to hurry up a bit!
____________
ID: 60035 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The fifth manuscript describes a very exciting recent discovery made in collaboration with another research group. As many of you know, a large number of diseases, including Alzheimer's, are associated with the folding of a protein not into its normal active state but instead into long repeating fibrils. These are called "amyloid fibrils" and the associated diseases are referred to as "amyloid" diseases. We used Rosetta to design molecules predicted to block the formation of the fibrils, and, strikingly, when these molecules are added to the disease state protein under conditions where it normally forms fibrils, none are formed. We are very excited about this amyloid blocker, but as always I must stress that there are a lot of steps before a finding like this leads to an actually distributed drug.
____________
ID: 60053 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The sixth manuscript describes our recent results on designing new DNA cutting enzymes that can cut the DNA double helix at very specific target sites. The enzymes we design only cut at one particular 20 base sequence. Remember that DNA is made of four different bases, A T G and C so a particular 20 base site might for example be AGAATAGGATCCAGATCGTC. This long a sequence is extremely unlikely to occur more than once in the human genome, so if we design an enzyme to cut at a particular place in the genome it is likely to cut only at that site (of course we can check this because the sequence of the human genome is known).
why would we want to do this? suppose you have a mutation in a gene and it is making you sick. if we can make an enzyme which cuts near the site of mutation and introduce this enzyme and a "correct" copy of the gene into your cells, the enzyme will cut the DNA and the cut will be repaired likely by copying the correct version we have added as well (cells repair breaks in DNA by copying the most closely related piece of DNA around). thus, with such enzymes we can potentially cure diseases by "gene therapy".
the manuscript describes progress towards this long range goal. we show we can design new enzymes that cut at new sites, and by examining how they work in detail we reveal some pretty big surprises in how the naturally occurring enzyme works which are fascinating in their own right and provide considerable insight into how to achieve the longer term goals of using redesigned enzymes for gene therapy.
____________
ID: 60070 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The seventh paper is also related to your CASP8 predictions. The assessors asked us and a couple of other groups who made models that were geometrically and energetically nearly indistinguishable from native structures to write a paper describing how we were able to consistently able to generate physically plausible structure models. This was much less work for me than the preceding six manuscripts--I had only to describe the key concepts behind rosetta@home and the search for very low energy structures that are familiar to those of you who follow the screen saver from time to time.
____________
ID: 60083 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Not all the writing I do is research papers, I have to write grants to get research funding as well (unfortunately). Today I'm putting together a proposal to use Rosetta to computationally design compounds which block the formation of the amyloid fibrils which accumulate in a number of different diseases (this is a followup on the fifth manuscript described below). My collaborator and I will send this proposal to the recently announced NIH "Challenge Grant" program, part of the federal stimulus package.
____________
ID: 60156 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Rosetta@home has received a substantial monetary contribution from an anonymous donor! Following the suggestion of the donor, the University of Washington has used the money to start a special “Rosetta@home fund” that will be used to pay part of David Kim’s salary (David is the architect of Rosetta@home and the person who keeps the project running), upgrade the servers as needed, and allow us to make more rapid progress on the disease-releated research Rosetta@home is carrying out. If you would like to make a (tax-deductible) contribution to the project, the link is Rosetta@home fund . David will be adding a link to this from the Rosetta@home home page in the next day or two. Thank you for your contributions to the project!
____________
ID: 60267 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
I'd like to thank rosetta@home participant Michael G R for finding and posting on this article on protein design and rosetta@home:
I talk to reporters fairly often, but I usually don't see the results (sometimes I don't want to!). This article I think is pretty good.
____________
ID: 60609 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
WIRED has a feature article on fold.it this week. It was originally a 3500 word piece that had much more about rosetta@home and casp, but unfortunately most of this got cut when they shrunk it to 2500 words at the end.
____________
ID: 60783 | Rating: 0 | rate:
/
Mod.Sense Forum moderator Project administrator Joined: Aug 22 06 Posts: 2391 ID: 106194 Credit: 0 RAC: 0
Baker was the Most Valuable Player in the protein chemistry world's biennial World Series, a competition to see who can predict the shape a protein will fold into, knowing nothing more than the sequence of its constituent parts. It's called the Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction, or CASP.
Of the 15 Foldit solutions that Baker submitted to CASP, seven had finished in the money...One of their solutions even took first place. A band of gamer nonscientists had beaten the best biochemists...when they turned to Cheese and asked him how he knew the way to tweak the proteins—for example, by orienting hydrophobic sidechains toward the protein core—he shrugged and said, "It just looks right."
He (Baker) and Popović have given the players a challenge: Design a new protein...These proteins could actually have therapeutic value in the real world, outside the game. And if they do, the Foldit players will share the credit. It might be the first time that a computer game's high score is a Nobel Prize.
____________ Rosetta Moderator: Mod.Sense
ID: 60786 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Baker was the Most Valuable Player in the protein chemistry world's biennial World Series, a competition to see who can predict the shape a protein will fold into, knowing nothing more than the sequence of its constituent parts. It's called the Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction, or CASP.
"most valuable player" is definitely a misnomer! much more accurate (given that I did virtually nothing directly myself) would be coach or general manger of winning team; the key players of which include the students and postdocs in my group who did lots of hard work, and all of you for crunching during the summer on the CASP targets!
____________
ID: 60827 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We just found a source of flu surface protein closely related to that on the strain making headlines in the last few days, and are starting to design tight binding inhibitors that target a site where the flu virus can't change. We will keep you posted as these design work units are sent out on rosetta@home. Fortunately, it seems that there may have been a bit of a false alarm with the flu in this case, but it is a good test case for us as we ultimately aim, with your help, to be able to design blockers to new pathogens within a relatively short time.
____________
ID: 60978 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Tonight I will describe how we are going about designing inhibitors for the flu surface protein.
We start from the crystal structure of the key flu virus surface protein and focus on a particular area of the surface of this protein that does not change from strain to strain (as you know, we keep getting the flu because it changes rapidly and new versions can slip by our immune systems because they look different from the versions we fought off in the past). By focusing on an invariant region of the surface we hope to generate broadly neutralizing inhibitors.
The next step in to use Rosetta to dock disembodied amino acid sidechains onto this portion of the virus surface to identify particularly tight low energy interactions. For example, we might find that a tryptophan residue fits in snugly into one pocket on the virus surface, and a tyrosine residue into another. Given these maps of tight binding interactions of isolated amino acids with the virus surface, we then design a protein scaffold which can position as many of these binding "hotspot" residues as possible with orientations correct for binding the virus simultaneously. The next step is to optimize the remaining residues of the protein we are designing to interact as tightly as possible with the virus.
We plan to experimentally test around 20 of the novel proteins generated using the above procedure. We will of course pick the designs which have the strongest computed interactions with the virus. We will experimentally test more than 1 design because our computational design methods, while powerful, are still not perfect, so we don't expect every design predicted to bind tightly to actually bind tightly.
To experimentally test the designed inhibitors, we begin by synthesizing artificial genes encoding the new proteins. We simply use the genetic code in reverse: for each amino acid in the computer generated protein sequence we insert the appropriate DNA base "codon" until we have covered the whole sequence. Once we have the synthetic genes, we carry out two types of experimental tests. I'll describe these in my next post.
____________
ID: 61374 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Manuel asked in the discussion thread about the manuscripts I described a couple of months ago. The second manuscript has already been published in a journal with the excellent policy of making all articles free to the public, you can take a look at
http://www.pnas.org/content/106/23/9215.long
The first manuscript was just accepted for publication in the same journal (proceedings of the national academy of sciences). The third and fourth manuscripts have also been accepted but not yet published.
The sixth manuscript got very enthusiastic reviews in the widely read journal Nature and hopefully you will be able to read about it there in not too long.
While we are all most interested in developing cures for diseases, aging and other problems, scientific papers are still a good way of benchmarking progress towards these goals. Your contributions to rosetta@home are certainly having a big impact on this short term measure of progress, and we will keep working together towards the longer term goals!
____________
ID: 62045 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Many of you know that David Kim is the architect of Rosetta@home and spends much of his time maintaining and improving rosetta@home. He also has had time to carry out some very interesting research. Some time ago (before the papers I described in my posts below were submitted), David wrote a paper describing his results on the search problem in protein structure prediction, and submitted it for publication. In contrast to much of our work, which is on biomedical problems such as aging as described below, David's work described in the paper is on the fundamental question of what limits our ability to predict structures accurately for larger proteins. We received the reviews from anonymous referees recently, and to give you some insight into what the process of scientific publishing is like, I'm pasting first the abstract and then the reviews here:
abstract of David Kim's paper:
The primary obstacle to de novo protein structure prediction is conformational sampling: the native state generally has lower free energy than non-native structures but is exceedingly difficult to locate. Structure predictions with atomic level accuracy have been made for small proteins using the Rosetta structure prediction methodology, but for larger and more complex proteins, the native state is virtually never sampled and it has been unclear how much of an increase in computing power would be required to successfully predict the structures of such proteins. In this paper we develop an approach to determining how much computer power is required to accurately predict the structure of a protein, based on a reformulation of the conformational search problem as a combinatorial sampling problem in a discrete feature space. We find that conformational sampling for many proteins is limited by critical “linchpin” features, often the backbone torsion angles of individual residues, which are sampled very rarely in unbiased trajectories and when constrained dramatically increase the sampling of the native state. These critical features frequently occur in less regular and likely strained regions of proteins that contribute to protein function. In a number of proteins, the linchpin features are in regions found experimentally to form late in folding, suggesting a correspondence between folding in silico and in reality.
Anonymous reviews of David's paper (we got these about three months after the paper was submitted to the journal).
Reviewer #1: It's an excellent and interesting paper. I certainly recommend publication. The novelty here is the idea of feature strings from native proteins and the identification by Bayesian analysis of "linchpin features" that are hard to find computationally and may be slow to develop physically.
I recommend publication, as is, but I would encourage the authors to comment on how to understand their linchpin features. For example, why should they reside near functional sites? One might have expected biological function not to have a role in determining the folding difficulty and kinetics. It would be interesting to hear the authors' thoughts about why these particular linchpins?
Reviewer #2: Ref: JMB-D-09-00307
Title: Sampling bottlenecks in de novo protein structure prediction
Authors: David E Kim; Ben Blum; Philip Bradley; David Baker
This is a provocative and important paper, and overall is handled very nicely with helpfully detailed examples. Quantifying an estimate of required sampling times and demonstrating that a feature list is adequate are both valuable, and the identification of unfavorable "linchpin" features that stall the predictions seems likely to be a very pivotal observation. It's also a nice name for them. I definitely recommend publication.
************* end of reviews
The paper has now been accepted, and will appear in the journal of molecular biology which I believe can be freely accessed by anyone. now you know what David was doing in his spare time last year; he has now switched to working on another project I will tell you about some other time.
____________
ID: 62409 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We got very good news today--the paper I described to you several months ago was finally accepted for publication in Nature, one of the most widely read science journals.
The manuscript describes the design of new enzymes for use ultimately in gene repair. Suppose you have a mutation in your genome that causes you to have a disease. If you know where the mutation is, in principle it can be repaired by "cutting out" the bad information (the mutation) and replacing it with the correct DNA sequence in this region. To do the cutting out is tricky because you only want to cut right near the mutation, and not anywhere else within your 3 billion base genome.
We are developing computational methods using Rosetta to design new enzymes to do this very highly specific "cutting". In this manuscript, we show that we can now design new specific cutting enzymes and that we can control independently how fast they cut and how tightly they bind to the target sequences we design them to cut.
Hopefully you will be able to read about this soon at your local newsstand!
____________
ID: 62898 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We just learned that a model we had generated using the "fold and dock" method on your computers allowed the solution of the phase problem for structural biologists who had grown crystals of the (dimeric) protein but not been able to solve its structure--this result led to a huge flurry of email today as the approach could be very useful for a broad class of problems.
As I mentioned earlier, the paper on the "fold and dock" protocol (which you can recognize by the symmetric dancing of molecules on your screensaver) was enthusiastically accepted for publication in the proceedings of the national academy of sciences and will be free on line soon. Here is one of the anonymous reviews of the manuscript:
"The authors introduce a simultaneous "fold and dock" routine for the prediction of symmetric systems with impressive results. The method combines the Rosetta folding algorithm and the lab's more recent symmetric docking protocol. Here, both the backbone moves and relative positioning of the components are accomplished in the same Monte Carlo routine. Undoubtedly, this feature is necessary for most of the systems, particularly those which are intertwined. The authors both benchmark their results and even include a few predictions (using structures that came out after simulations). In addition they test the utility of the method for ab initio phasing and structure prediction incorporating some NMR-based constraints. This is an extremely nice piece of work which addresses an outstanding challenge in structural biology and protein structure prediction."
____________
ID: 62924 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
As Mod Sense very nicely described in the discussion thread, with rosetta@home we are tackling basic research problems in addition to disease related research such as the gene therapy efforts I described in my last post and the amyloid fiber blocking and other projects described in the "disease related research" section on our home page. One of the "basic research" problems we are tackling with rosetta@home is the development of more accurate energy functions. This is a critical problem for design of new drugs to cure diseases. Here is why:
Almost all drugs work by binding tightly to a protein structure and blocking or modulating the function of the protein. To be produced cheaply, and to be able to access the inside and outside of your cells, it is usually desirable for a drug to be a not too large molecule. Chemists have put together large collections of potential drug molecules both in computer databases and in actuality. Big pharmaceutical companies have on the order of millions of such compounds they can test.
So--you would think that to find a new drug, given the structure of a protein target, one could simply screen on the computer for the drug that binds the tightest to the target. For each potential drug compound, we can determine its lowest energy binding mode to the target using Rosetta docking--you have probably seen small molecules docking into proteins on your screensavers. There are other programs for small molecule docking as well which are used in other distributed computing projects.
We and others have been pretty successful at predicting how a specific small molecule binds to a protein. We can test this by comparing the lowest energy structure you find on rosetta@home with the crystal structure of the small molecule bound to the protein if there is one. the big advantage of rosetta compared to other approaches is that the protein can flex during the docking process; this is important because many experimentally determined structures have shown considerable movements take place in proteins when they bind small molecules.
However, for drug discovery, the problem is not just to dock one small molecule into a protein, but to dock millions of small molecules into proteins, and find the one that binds the tightest. the amount of computer time required to accurately dock millions of small molecules into a protein target with a detailed physically realistic model like Rosetta is too large for rosetta@home currently. however, this is not the only problem. different small molecules have different numbers and different types of atoms, and determining which of these has the tightest binding energy to the protein is very challenging. We and others have had considerable difficulty in ranking sets of compounds known to bind a target, because errors in energy computation have big effects. We can predict protein structures accurately because the correct structure has very much lower energy than any other structure, but relatively small errors in energy computation can drastically change the ranking of different small molecules bound to a protein.
Because of these problems, large pharmaceutical companies surprisingly enough rely as much or more on brute force experimental screening to see which of their millions of compounds bind most tightly to a target as they do on screening on the computer. of course, screening through millions of compounds for the tightest binder experimentally is very slow and hugely expensive-this is one of the reasons why drug discovery is so expensive.
So if we can improve our ability to calculate energies accurately, it would have a huge effect on many important applications, including drug discovery and design. In my next post, I'll explain how we are using rosetta@home to increase the accuracy of energy calculations.
____________
ID: 62939 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
As I described in my last post, increasing the accuracy of the energy function (Rosetta@home's description of reality) is critical to all of our efforts, including designing new drug molecules that bind tightly and specifically to target proteins.
What information can serve as a guide to improve the energy function? In some cases, we actually know the energy difference between two conformations, or more often, between two very closely related sequences which have pretty much the same structure. Graduate student Liz Kellogg is working to use this type of information to test the energy function.
A particularly useful source of information are protein structures determined experimentally by X-ray crystallography. There are many thousands of such structures, and each one we know must be very near the lowest energy structure for the corresponding amino acid sequence. We can test the Rosetta energy function by seeing if it properly assigns lower energy to these structures than to very different conformations that can be generated. Rosetta almost always passes this test as I've described elsewhere--this is why the structure prediction problem is primarily a sampling problem.
For a more sensitive test of energy function accuracy, we compare the distributions of distances between atoms, the distributions of torsion angles, the distribution of hydrogen bonding geometries in experimentally determined structures to those in low energy Rosetta models. the discrepancies are highlighting internal inconsistencies in the Rosetta energy function, which are straightforward to correct once they are found. Postdoctoral fellow Yifan Song is making great progress in ironing out these last wrinkles in the rosetta energy function. his approach is to identify discrepancies in the geometries in Rosetta generated structures, tune the energy function to try to eliminate the discrepancy, and then to generate new structures using the new energy function to see if we have come closer to the truth. Very encouragingly, as Yifan improves the energy function in this way we are getting better results in other tests; for example the close to native structures are even lower in energy compared to non-native structures than they were originally.
The origins of these remaining inaccuracies that Yifan is fixing will interest some of you. The Rosetta energy function describes hydrogen bonding and interactions between spatially adjacent atoms very precisely. The energy function also models the energies associated with rotations around backbone and sidechain torsion angles, using data taken from the large number of high resolution structures. It turns out that when these two types of terms are combined, there are subtle double counting effects--some geometries are favored for example both because a very low energy hydrogen bond is formed, and because the backbone geometry is very frequently observed in protein structures--and these together lead to an oversampling of these local geometries in low energy Rosetta structures. Yifan's solution is to modify the torsional potential to make these geometries less highly rewarded since their favorability is already captured by the hydrogen bonding potential. As you can imagine, to get these corrections perfect so that the geometries in Rosetta models precisely match those in native structures requires some iteration, which is why Yifan has been extensively running on rosetta@home in the past week!
____________
ID: 63383 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Today I was asked this question:
"Hi David, I am a user of the BOINC application and running Rosetta. I have searched the website and can't find any sort of overall status on how the entire mapping project is going.
I, and I think many others, would be very interested to seeing some sort of progress indicator on where the project is and some predictions on when the mapping process will be complete at Rosetta's current research/growth rate. (when every possible protein fold has been completely mapped and cross-checked)
Is this possible? Are we at 5%? 10%? Will the project be complete in 5 years? 10 years? This would be great info for the layman that doesn't know much about this subject but is happy to donate computer time for this research."
I thought I would answer this question here for other participants who might be interested as it highlights the different between rosetta@home and most other distributed computing projects.
The answer is that the problems we are tackling with rosetta@home--computing the structures of biological macromolecules and designing new molecules to try to cure diseases and improve human health generally--are long term problems that will not be completely "solved" any time soon. Much of our work is also aimed at improving our methods and algorithms so we can design new molecules and ultimately drugs more accurately.
In most distributed computing projects, a computer program that has been developed is run on large sets of data. the computer program doesn't change over the course of the project, and the progress toward completion can be assessed by determining what fraction of the data set the calculation has been run on and what fraction is left.
This estimate can't be done for rosetta@home because the scope of problems we are trying to solve is much larger, and because we are continually extending rosetta@home to try to solve new problems.
So we can't quantify our progress by giving you a % complete. Instead, the project's contributions and progress can be evaluated by the many scientific publications it has produced, some of which I've tried to summarize in these posts. (the current issue of Nature for example has the article I described below on designing new enzymes to ultimately repair disease causing mutations).
____________
ID: 63932 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
I often get asked about the progress we are making with the invaluable contributions all of you are making to our efforts. While we (unfortunately) have not yet succeeded in developing real world therapies for treating diseases, your contributions have been critical for our advances on the basic science side which should ultimately lead to the development of such therapies. These are documented in the scientific publications that have come out of the project; see http://boinc.bakerlab.org/rosetta/rah_publications.php for a recently updated list. Publication lists are one way you can assess the impact of your contributions to distributed computing projects--hopefully in not too long you will be able to see the impact in new disease treatments!
____________
ID: 64060 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Many of you I'm sure remember the protein folding calculations with the zinc atoms. Graduate student Chu Wang wrote a scientific paper describing the method he developed for predicting the structures of zinc containing proteins and the testing of the method with all of your help. The paper was just accepted for publication in the scientific journal Protein Science, and now scientists everywhere will be able to learn about Chu's method so they can predict structures of this important class of proteins also.
____________
ID: 64416 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We got some good news today. A manuscript that many of you contributed to through Rosetta@home was just accepted for publication in Science magazine, perhaps the most widely read scientific journal. The paper shows that accurate structures can be calculated using Rosetta for proteins up to 200 amino acids long if even a small amount of experimental data (from NMR experiments) is available to guide the search. This is an exciting advance because it could make it very much faster and easier to experimentally determine protein structures. Thanks everybody for your contributions to this work, and to our ongoing research efforts!
____________
ID: 64989 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Sarel has collected many very promising potential flu virus inhibitors from your rosetta@home calculations over the last ten days, and will be selecting a number of them for experimental testing--see his postings in the "design of protein-protein interactions" thread.
____________
ID: 65046 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We are entering a very busy and science packed time for Rosetta@home. As described in the "design of protein-protein interfaces" thread, we are now designing proteins to bind to and block several different targets, including the flu virus. At the same time, we are gearing up for CASP9 which will start in May by testing out both our new structure prediction methodology and the improved energy function which underlies it. The new methodology is quite CPU intensive, and we are hoping for as much user participating as possible once CASP starts; whatever you can spare now as well would be great so we can go the last 9 yards on structure prediction methods development before CASP and at the same time proceed as rapidly as possible on the protein-protein interaction designs. thanks! David
____________
ID: 65245 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
This survey request is from David Anderson, the creator of BOINC, and Oded Nov from NYU
Dear Rosetta@home volunteer:
We are conducting a survey of Rosetta@home volunteers in order to
better understand why people participate in volunteer computing and
contribute computer resources.
We would be extremely grateful if you could help us by filling out a
questionnaire. If you are not interested, ignore the rest of this
email.
The survey is at http://boinc.berkeley.edu/survey/ It should take no
more than 10-15 minutes. Your responses will be used for research
purposes and to improve BOINC.
We will be happy to share our findings with you, and they will be made
available once we complete the data collection and analysis.
With many thanks -
Dr. David P. Anderson
Director, BOINC
University of California, Berkeley
email: davea at ssl.berkeley.edu
Prof. Oded Nov
Polytechnic Institute of New York University
email: onov at poly.edu
____________
ID: 65268 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Our paper on solving structures of proteins of up to 200 amino acids using very limited experimental data is in the Feb 19 issue of Science magazine (pg 1014) which is on some news stands now. this wouldn't have been possible without Rosetta@home--thanks again everybody!
____________
ID: 65483 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
While the results are still preliminary, it appears that Rosetta@home has produced an extremely exciting result! As I described a few posts ago, many of you through rosetta@home contributed to the design of proteins predicted to bind very tightly to the influenza flu virus. We have now completed the first round of testing of the designed proteins, and one of them in the experiments conducted thus far clearly binds very tightly to the virus. Our data also indicate that the binding is at a site critical to the virus invasion of our cells, and so the protein may be able to neutralize the virus. I will keep you posted over the next couple of months as the picture becomes clearer--but for now--thank you all for making this possible!!
____________
ID: 65709 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
I was asked on the discussion thread about the timescale for learning more about the influenza binding protein I described in my previous post. I'm reposting my answer here:
We are doing a series of tests and control experiments in my lab in the next 2-3 weeks to rule out various possible artifacts. If, as we expect, the design passes with flying colors, we will send it to Scripps research institute where the ability of the design to neutralize the virus in cell based tests and the extent to which the design neutralizes different strains of virus will be measured. I would expect we would know the results of this in several months. We will also work to solve the crystal structure of the design bound to the virus to confirm the design binding mode. This hopefully will not take more than a few months as well.
I will keep all of you posted here about the results from these experiments. I am very optimistic, but one should be cautious about getting to excited too early about results like these--there are very many places where things can go wrong just with the biochemistry, and after this there are very many steps to actually make a protein into a drug--this is why there are so few new drugs for curing diseases being discovered.
For those of you who would like to try your hand at improving designed binders to the influenza virus, we are now posting virus inhibitor design challenges on foldit.
____________
ID: 65729 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
Experiments this past week have made us even more confident that the designed influenza binder is working as in the design model. we used "directed evolution" methods to identify amino acid changes that make the rosetta@home designed protein bind even more tightly to the virus. we found mutations at two positions: first, at an alanine residue in the design, the evolution process found a valine, and inspection of the design model showed some extra space around the alanine that would be filled by the slightly larger valine. the second amino acid change involved a charged aspartate residue in the design that in retrospect was too close to the virus protein--it was changed to a non charged residue which is less energetically costly to bury upon binding.
we are now combining these two substitutions, and expect that the combination should bind still more tightly to the virus than any protein we have tested so far. we should know later this week--I'll keep you posted!
____________
ID: 65767 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We got some good news today in an announcement by Vice President Biden:
We were funded to work with three other research groups to develop a completely new pathway for using solar energy to transform CO2 into the large molecules that the world has grown to depend on (fuels, etc)--this if successful could greatly reduce dependence on fossil fuels and contribute to removing CO2 from the atmosphere.
While the large majority of rosetta@home calculations will remain focused on biomedical problems, expect to see from time to time work units relating to design of enzymes for CO2 capture and conversion.
____________
ID: 65902 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We got some more good news today: a manuscript we submitted to Science magazine on rosetta based de novo design of a new enzyme which catalyzes the formation of two carbon-carbon bonds between two small molecules was accepted for publication. The work described in the manuscript is a real step forward in designing enzyme catalysts for reactions not catalyzed by naturally occurring enzymes, and could provide new routes to drug molecules which can be hard to synthesize using traditional methods.
____________
ID: 65964 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
CASP9 is now in full swing and we need your help! We are being overwhelmed with targets and need as much CPU power as possible!
I just got this from the organizers:
Subject: CASP update - May 7
First week of CASP9 prediction season is over. We have released 14 targets. The vast majority of them were easy TBM targets. Next week you will find some harder targets in the human prediction category.
As of today, we have 125 groups (predominantly servers at this early stage) contributing models to the Prediction Center. You can always find the latest CASP statistics at http://predictioncenter.org/casp9/numbers.cgi .
If you are interested in following the prediction season as it happens, the above web site is a good source of information.
____________
ID: 66042 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We are absolutely delighted by the recent increase in the total throughput of rosetta@home, which could not come at a more critical time! we are having to make very difficult choices between CASP9 structure prediction calculations and the next generation of pathogen inhibiting proteins building on our success with the flu virus inhibitor, and the new contributions of computer power many of you are making are helping immensely. Thank you very much!
____________
ID: 66165 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
We have now confirmed the tight binding of our designed Spanish Flu inhibitor to the flu virus using multiple different methods (it is always good to be totally certain with exciting results like these!). For those of you with some chemistry background, the binding constant is about 20nM.
With collaborators at Scripps research institute we are now trying to determine the structure of the designed complex between the inhibitor and the virus by x-ray crystallography (to see whether binding is as in the design model). With the tight binding confirmed, we are now starting to investigate whether the designed protein prevents the virus from infecting cells.
____________
ID: 66435 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
A manuscript describing the results on FoldIt, which many of you contributed to, was just accepted for publication in Nature. The idea for FoldIt came from rosetta@home participants who posted on the message boards about wanting to be able to guide the course of the folding trajectory. Please keep letting us know your thoughts and suggestions!
Rosetta@home has now been directly responsible or closely associated with two papers in Science (one on enzyme design, one on new approaches for structure determination) and two papers in Nature (this one on Foldit, and one last year on endonuclease design for gene therapy) in the last 9 months. This kind of impact at the forefront of scientific research is I think a first for volunteeer computing, and perhaps the strongest indication to date of the power and value of volunteer computing for pushing forward the boundaries of scientific understanding.
Thank you all for your invaluable contributions to our collective efforts!
____________
ID: 66617 | Rating: 0 | rate:
/
David Baker Forum moderator Project administrator Project developer Project scientist Joined: Sep 17 05 Posts: 637 ID: 122 Credit: 214,854 RAC: 0
The most recent issue of Science magazine has our paper on the use of Rosetta to design a new carbon-carbon bond forming enzyme, along with a commentary. This paper has attracted a lot of attention in the press. Thank you all for your contributions to this work and to our ability to move forward with designing enzymes and other proteins that will hopefully be of use to society in not too long.
____________