Rosetta@home Science FAQ - by Vanita Sood
What is Rosetta?
Rosetta is a protein structure prediction and design program.
What is a protein?
A protein is a polymer of amino acids that is encoded by a gene.
What are amino acids?
Amino acids are chemical moieties that form the basic building blocks of proteins. There are 20 different amino acids that are specified by the genetic code. These 20 amino acids fall into different groups based on their chemical properties: acidic or alkaline, hydrophilic (water-loving) or hydrophobic (greasy).
What do proteins do?
Proteins perform many essential functions in the cells of living organisms. They replicate and maintain the genome (DNA), they help cells grow and divide, and stop them from growing too much, they give a cell its identity (eg liver, neuron, pancreatic, etc.), they help cells communicate with each other. Proteins, when mutated or when affected by toxins can also cause disease, such as cancer or alzheimer's. Bacterial and viral proteins can hijack a cell and kill it. In short, proteins do everything.
How do proteins perform all their different functions?
Each protein folds into a unique 3-dimensional shape, or structure. This structure specifies the function of the protein. For example, a protein that breaks down glucose so the cell can use the energy stored in the sugar, will have a shape that recognizes the glucose and binds to it (like a lock and key). It will have chemically reactive amino acids that will react with the glucose and break it down, to release the energy.
Why do proteins fold into unique structures?
It's long been recognized that most for most proteins the native state is at a thermodynamic minimum. In English, that means the unique shape of a protein is the most stable state it can adopt. Picture a ball in a funnel - the ball will always roll down to the bottom of the funnel, because that is the most stable state.
What forces determine the unique native (most stable) structure of a protein?
The sequence of amino acids is sufficient to determine the native state of a protein. By virtue of their different chemical properties, some amino acids are attracted to each other (for example, oppositely charged amino acids) and so will associate; other amino acids will try to avoid water (because they are greasy) and so will drive the protein into a compact shape that excludes water from contacting most of the amino acids that "hide" in the core of this compacted protein.
Why is it so difficult to determine the native structure of a protein?
Even small proteins can consist of 100 amino acids. The number of potential conformations available to even such a (relatively) small protein is astronomical, because there are so many degrees of freedom. To calculate the energy of every possible state (so we can figure out which state is the most stable) is a computationally intractable problem. The problem grows exponentially with the size of a protein. Some human proteins can be huge (1000 amino acids).
So how does Rosetta approach this problem?
The rosetta philosophy is to use both an understanding of the physical chemical properties different types of amino acid interactions, and a knowledge of what local conformations are probable for short stretches of amino acids within a protein to adopt, to limit the search space, and to evaluate the energy of different possible conformations. By sampling enough conformations, Rosetta can find the lowest energy, most stable native structure of a protein.
Why is distributed computing required for structure prediction by Rosetta?
In many cases where the native structure of a protein is already known, we have noticed that Rosetta's energy function can recognize the native state as more stable than any other sampled state. When starting from a random conformation, however, we've observed that the native state is never sampled. By applying more computing power to the problem, we can sample many more conformations, and try different search strategies to see which is the most effective.
How will Rosetta@home benefit medical science?