Plans for the Future
For information about Rosetta@home, click here.
The goal of our current research is to develop an improved model of intra- and intermolecular interactions and to use this model to predict and design macromolecular structures and interactions. Prediction and design applications, which can be of great biological interest in their own right, also provide stringent and objective tests that improve the model and increase fundamental understanding.
We use a computer program called Rosetta to carry out protein and design calculations. At the core of Rosetta are potential functions for computing the energies of interactions within and between macromolecules, and methods for finding the lowest energy structure for an amino acid sequence (protein-structure prediction) or a protein-protein complex and for finding the lowest energy amino acid sequence for a protein or protein-protein complex (protein design). Feedback from the prediction and design tests is used continually to improve the potential functions and the search algorithms. Development of one computer program to treat these diverse problems has considerable advantages: first, the different applications provide complementary tests of the underlying physical model (the fundamental physics/physical chemistry is, of course, the same in all cases); second, many problems of current interest, such as flexible backbone protein design and protein-protein docking with backbone flexibility, involve a combination of the different optimization methods.
Design of Protein Structure
Over the past several years, we have used our computational protein design method to stabilize dramatically several small proteins by redesigning every residue of their sequences, to redesign protein backbone conformation, to convert a monomeric protein to a strand-swapped dimer, and to thermostabilize an enzyme. A highlight was the redesign of the folding pathway of protein G, a small protein containing two beta-hairpins separated by an alpha-helix. In the naturally occurring protein, the first hairpin is disrupted and the second hairpin is formed at the rate limiting step in folding. In a redesigned variant in which the first hairpin is significantly stabilized and the second hairpin destabilized, the order of events is reversed: the first hairpin is formed and the second hairpin disrupted in the folding transition state. The ability to redesign protein-folding pathways rationally shows that our understanding of the determinants of protein folding has advanced considerably.
Figure 1: Design of proteins and protein-protein interactions with high-resolution accuracy. Comparison of design model and crystal structure of (left) interface of novel designed endonuclease with new DNA cleavage specificity, and (right) the de novo designed protein TOP7.
Left panel, Tanja Kortemme. Right panel, Gautam Dantas.
Particularly exciting recently is the creation of novel proteins with arbitrarily chosen three-dimensional structures. We developed a general computational strategy for creating these protein structures that incorporates full backbone flexibility into rotamer-based sequence optimization. This was accomplished by integrating ab initio protein structure prediction, atomic-level energy refinement, and sequence design in Rosetta. The procedure was used to design a 93-residue protein called TOP7 with a novel sequence and topology. TOP7 was found to be monomeric and folded, and the x-ray crystal structure of TOP7 is strikingly similar (RMSD = 1.2 Å; see right panel of Figure 1) to the design model. The design of a new globular protein fold and the close correspondence of the crystal structure to the design model have broad implications for protein design and protein-structure prediction and open the door to the exploration of the large regions of the protein universe not yet observed in nature.
Design of Protein-Protein Interactions
To extend these methods to protein-protein interactions and particularly to the redesign of interaction specificity, we chose the high-affinity complex between colicin E7 DNase and its cognate inhibitory immunity protein as a model system. We used the physical model described above and a modification of our rotamer search-based computational design strategy to generate novel DNase-inhibitor protein pairs predicted to interact tightly with one another but not with the wild-type proteins. The designed protein complexes have subnanomolar affinities, are functional and specific in vivo, and have more than an order of magnitude affinity difference between cognate and noncognate pairs in vitro. This approach should be applicable to the design of interacting protein pairs with novel specificities for delineating and reengineering protein interaction networks in living cells.
In collaboration with the research groups of Barry Stoddard and Ray Monnat (Fred Hutchinson Cancer Research Center), we generated an artificial, highly specific endonuclease by fusing domains of homing endonucleases I-DmoI and I-CreI through computational optimization of a new domain-domain interface between these normally noninteracting proteins. The resulting enzyme, E-DreI (Engineered I-DmoI/I-CreI), binds a long chimeric DNA target site with nanomolar affinity, cleaving it precisely at a rate equivalent to its natural parents. We are currently trying to generate new endonucleases by extending our design methodology to protein--nucleic acid interfaces to redesign the protein-DNA interface.
In both of these systems it has been possible to determine x-ray crystal structures of the designed complexes. As in the TOP7 case, the actual structures are very close to the design models (Figure 1, left panel), which validates the accuracy of our approach to high-resolution modeling.
Prediction of Protein Structure
The picture of protein folding that motivates our approach to ab initio protein tertiary structure prediction is that sequence-dependent local interactions bias segments of the chain to sample distinct sets of local structures, and that nonlocal interactions select the lowest free-energy tertiary structures from the many conformations compatible with these local biases. In implementing the strategy suggested by this picture, we use different models to treat the local and nonlocal interactions. Rather than attempting a physical model for local sequence-structure relationships, we turn to the protein database and take the distribution of local structures adopted by short sequence segments (fewer than 10 residues in length) in known three-dimensional structures as an approximation to the distribution of structures sampled by isolated peptides with the corresponding sequences.
The primary nonlocal interactions considered are hydrophobic burial, electrostatics, main-chain hydrogen bonding, and excluded volume. Structures that are simultaneously consistent with both the local sequence structure biases and the nonlocal interactions are generated by using simulated annealing to minimize the nonlocal interaction energy in the space defined by the local structure distributions.
Figure 2: Blind protein structure predictions from CASP3 and CASP4.
A: Left, crystal structure of the MarA transcription factor bound to DNA; right, our best submitted model in CASP3.Despite many incorrect details, the overall fold is predicted with sufficient accuracy to allow insights into the mode of DNA binding.
B: Left, the crystal structure of bacteriocin AS-48; middle, our best submitted model in CASP4; right, a structurally and functionally related protein (NK-lysin) identified using this model in a structure-based search of the Protein Data Bank (PDB). The structural and functional similarity is not recognizable using sequence comparison methods (the identity between the two sequences is only 5 percent).
C: Left, crystal structure of the second domain of MutS; middle, our best submitted model for this domain in CASP4; right, a structurally related protein (RuvC) with a related function recognized using the model in a structure-based search of the PDB. The similarity was not recognized using sequence comparison or fold recognition methods.
Image: Rich Bonneau
Rosetta has been tested in the biannual CASP (critical assessment of structure prediction) experiments in which predictors are challenged to make blind predictions of the structures adopted by protein sequences whose structures have been determined but not yet published. Since CASP3 in 1998, Rosetta has consistently been the top performing method for ab initio prediction, as reported by independent assessors. In the CASP4 experiment, for example, Rosetta was tested on 21 proteins. The predictions for these proteins, which lack detectable sequence similarity to any protein with a previously determined structure, were of unprecedented accuracy and consistency. (Some examples are shown in Figure 2.) Excellent predictions were also made in the CASP5 and CASP6 experiments. Encouraged by these promising results, we generated models for all large protein families fewer than 150 amino acids in length.
Figure 3: The first close to atomic-level resolution, blind ab initio structure prediction-CASP6 T281. The high-resolution refinement methodology described in the text produced a model 1.5-Å RMSD from the crystal structure (left panel), with aspects of the native side-chain packing (right panel).
Image: Phil Bradley
A highlight of CASP6 was the first de novo blind prediction that used our high-resolution refinement methodology to achieve close to high-resolution accuracy. The relatively short sequence (76 residues) allowed us to apply our all-atom refinement methodology not only to the native sequence but also to the sequence of many homologs. The center of the lowest energy cluster of structures turned out to be remarkably close to the native structure (1.5 Å; Figure 3). The-high resolution refinement protocol decreased the RMSD from 2.2 Å to 1.5 Å, and the side chains pack in a somewhat native-like manner in the protein core (Figure 3, right panel).
We have extended the Rosetta ab initio structure prediction strategy to the problem of using limited experimental data to generate models of proteins. By incorporating chemical shift and NOE information and more recently dipolar coupling information into the Rosetta structure generation procedure, we have been able to generate much more accurate models than with ab initio structure prediction alone or when using the same limited data sets with conventional nuclear magnetic resonance (NMR) structure generation methodology. An exciting recent development is that the Rosetta procedure can also take advantage of unassigned NMR data and hence circumvent the difficult and tedious step of assigning NMR spectra.
The Rosetta ab initio structure prediction method, the Rosetta-based NMR structure determination method, and a new method for comparative modeling that uses the Rosetta de novo approach to model the parts of a structure (primarily long loops) that cannot be modeled accurately based on a homologous structure template have all been implemented in a public server called Robetta. This server, which has a constant backlog of users worldwide, was one of the best all-around fully automated structure prediction servers in the CASP5 and CASP6 tests.
Prediction of Protein-Protein Interactions
For a number of years we have worked on protein structure refinement, a challenging problem because of the large number of degrees of freedom. We became interested in protein-protein docking because, with the approximation that the two partners do not undergo significant conformational changes during docking, the space to be searched -the six rigid-body degrees of freedom in addition to the side-chain degrees of freedom- is much smaller. While important in its own right, this problem is a good stepping stone to the harder structure refinement problem.
We developed a new method to predict protein-protein complexes from the coordinates of the unbound monomer components. This method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations with the Monte Carlo minimization procedure and physical model used in our high-resolution structure prediction work. The simultaneous optimization of side-chain and rigid-body degrees of freedom contrasts with most other current approaches, which model protein-protein docking as a rigid-body shape-matching problem, with the side chains kept fixed. We have recently improved the method (RosettaDock) by developing an algorithm that allows efficient sampling of off-rotamer side-chain conformations during docking.
Figure 4: CAPRI (critical assessment of predicted interactions) protein-protein docking results. Superposition of predicted (blue) and x-ray (red and orange) protein complex structures. Green, a side chain whose conformation was correctly predicted to change upon complex formation. Upper panel, whole complex. Lower panel, details of the interface. In addition to the rigid-body orientation, the conformations of most of the side chains are predicted correctly.
Image: Ora Furman
The power of RosettaDock was highlighted in the recent blind CAPRI protein-protein docking challenge that was held in December 2004. In CAPRI, predictors are given the structures of two proteins known to form a complex, and challenged to predict the structure of the complex. RosettaDock predictions for targets without significant backbone conformational changes were striking, as shown in Figure 4. Not only were the rigid-body orientations of the two partners predicted nearly perfectly but also almost all the interface side chains were modeled very accurately. These correct models clearly stood out as lower in energy than all other models we generated, which suggests the potential function is reasonably accurate.
These promising results suggest that the method may soon be useful for generating models of biologically important complexes from the structures of the isolated components, and more generally suggest that high-resolution modeling of structures and interactions is within reach. A clear goal for our monomeric structure prediction work is to approach the level of accuracy of these models.
Improvement of Physical Model
Our current approach to improving energy functions involves a combination of quantum chemistry calculations on simple model compounds, traditional molecular mechanics approaches, and protein structural analysis. We have used such an approach to develop an improved hydrogen-bonding potential. A particularly notable result is that the orientation dependence of the hydrogen bond in quantum chemistry calculations on formamide dimers is remarkably similar to that seen in side-chain--side-chain hydrogen bonds in protein structures but different from that in current molecular mechanics force fields, which neglect the covalent character of the hydrogen bond. Feedback from the prediction and design calculations has provided continual impetus and guidance for improving the energy function; for example, inadequacies in our treatment of protein-protein interactions have led to the recent development of a rotamer-based model for water-mediated hydrogen bonds.
Plans for the Future
Our prediction and design methods have now reached the point where they can be applied to important biological problems. Particularly encouraging after years of work on high-resolution modeling are the close to atomic resolution predictions of the structures of complexes in CAPRI (Figure 4), the 1.5-Å de novo prediction in CASP6 (Figure 3), and the close agreement of the TOP7 (Figure 1, right) and protein-protein interface design models (Figure 1, left) with the x-ray crystal structures. These results suggest that high-resolution modeling is starting to work.
In the next several years, we aim to improve and extend our methods. We are particularly focused on improving the accuracy of high-resolution structure prediction (which will be required if the models are to be generally useful). To accomplish this, we will work to improve the underlying physical model and the sampling methodology. We are also developing improved methods to predict and redesign protein-DNA interaction specificity, and extending our protein design methodology to the design of enzymes that catalyze chemical reactions not catalyzed by naturally occurring proteins.
Please visit our web site at http://www.bakerlab.org for additional information including a list of our research publications.