Rosetta Insight

Message boards : Rosetta@home Science : Rosetta Insight

To post messages, you must log in.

Volunteer moderator
Project administrator

Send message
Joined: 5 Sep 06
Posts: 423
Credit: 6
RAC: 0
Message 74486 - Posted: 20 Nov 2012, 11:00:00 UTC
Last modified: 20 Nov 2012, 13:46:07 UTC

Hi All

I’ve recently been asked to help to get useful information out to the R@H community on a reasonably regular basis about what the folks at the Bakerlab are currently working on. I’ll speak to Prof. David Baker on Skype periodically, and then report back to the community here. The aim is to find an appropriate balance between covering the key points of the many different and highly technical things they’re working on while keeping it relevant to the majority of Rosetta@Home contributors. It would also be great (and very useful for me!) to get your questions to ask too. In turn, hopefully this will also help to attract new users to the project so that the team have more compute resources at their disposal to help speed up and extend their research.

I’m not affiliated with the lab or project in any way, other than having run R@H for quite a few years now. I’ll do my best – there’s going to be a steep learning-curve for me so go easy if I make any mistakes (and feel free to correct me – I’ll update and revise where necessary!).

First up, because it’s central to the project, here’s some info on proteins which is what R@H and the Bakerlab are all about. Many of you will know this stuff, and some of you will probably have relevant PhDs, so this is for those who want a quick introduction or recap. Feel free to correct/clarify any of this, or provide any appropriate analogies or links that will help – my Biology degree is a decade-old now and I work in hydro so it might show!:

A bit of background: Proteins

If you don’t really know what proteins are or do, or want a quick recap, here’s a good site and here’s an excellent animation of protein production in action (apparently that one is approximately in real-time).

Basically, proteins are chains of amino acids and they can either be structural or functional (in which case they are called enzymes; essentially they’re tiny machines that perform one or more functions). The sequence of amino acids determines the shape of the protein and therefore its function/structure.


  • Gene: a length of DNA (or RNA) which codes for a protein.

  • DNA: the molecule which stores the code for all of the proteins in an organism. The code is stored as the sequence of bases (there are four). There are ‘start’ and ‘stop’ codes which determine the length of a gene. DNA is transcribed/read (by proteins!) and the appropriate sequence of amino acids is produced (there are intermediate steps in-between).

  • Amino acid: a group of 22 molecules from which all proteins are made.

  • Protein: One or more chains of amino acids. These chains fold up in highly complex, but determinable ways, and the folded shape is critical because that determines the function or structure of the protein. Sometimes other elements (co-factors) are incorporated into the folded structure (e.g. iron in the haemoglobin protein).

  • Bakerlab: The lab at the University of Washington in Seattle headed by Prof. David Baker

  • Rosetta software: The protein modelling software suite developed initially in the Bakerlab, but now developed worldwide through Rosetta Commons. Not to be confused with Rosetta@Home, this is the software package that is used by labs and private organisations around the world for a range of protein modelling tasks. For a summary, see here.

  • Rosetta@Home: The Rosetta Software Suite which has been packaged to allow it to run on the BOINC distributed computing platform.
    A bit of background: Rosetta
    The Rosetta software has two related functions:

  • Protein modelling: determining the 3D structure of a protein from the data available, such as its amino acid or DNA sequence.

  • Protein design: designing new proteins to perform specific functions.
    It is used for research (see the lab’s research papers for examples), while also being constantly developed to improve its accuracy and functionality. There’s a great video on Rosetta here.
    The amount of computer power required for protein research can be phenomenal due to the vast number of shapes that even small proteins could (but don’t) take, which makes it an excellent fit for distributed computing.

Without computer modelling, the methods available to determine a protein’s shape are:

    1. X-ray crystallography
    2. Electron microscopy (can determine the protein’s outer shape but cannot see within to see how it is folded)
    3. NMR spectroscopy

These methods are generally expensive (I’ve heard $100,000 USD per protein using XRC although it might vary wildly!) and resource-intensive, and are not necessarily conclusive in their results. Software modelling of proteins therefore has a large role to play by reducing the costs and time required, and potentially improving accuracy too. The holy-grail is therefore for Rosetta to be able to determine the shape of a protein from a DNA or amino acid sequence for any protein. My understanding is that at present it is very good at that for some proteins (again, see the research papers section for evidence), but not all, and improving this is one of the main areas of research.

How can you tell if a model of a protein is right without already knowing the structure and comparing the model against that?
Proteins fold into the lowest energy state that they can – the lower the energy retained within the protein, the more stable it will be. That is because it will then require more energy (heat) to remove it from that state. The amount of energy stored within a model can be calculated from the relative position of the atoms in its folded shape. Essentially, proteins fold into a low energy state, and so that state is what Rosetta is searching for.

Protein Design (i.e. creating new proteins that don’t exist in nature):
As well as modelling natural proteins, Rosetta can also be used to create new ones to perform specific functions. The process is something along the lines of:

    1. Start with the target structure’s stable points (for HIV or influenza that is those points which don’t rapidly mutate or get swapped).
    2. Find an amino acid side-chain that will bind to each of those identified points.
    3. Design a protein back-bone to join those side-chains together into a single protein.
    Once designed, the amino acid sequence of the protein can then be purchased relatively cheaply, and tested to see if it performs as expected in the lab.

Here\'s a very interesting post by Nobuyasu explaining what protein design work he and his colleagues have been working on.

OK, that\'s enough background to the project - on to the questions!

ID: 74486 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Volunteer moderator
Project administrator

Send message
Joined: 5 Sep 06
Posts: 423
Credit: 6
RAC: 0
Message 74487 - Posted: 20 Nov 2012, 11:14:10 UTC
Last modified: 20 Nov 2012, 15:35:37 UTC

First conversation – post number 1

Bakerlab Papers
I must admit that until this conversation I didn’t realise that all of the papers that the lab produce are available on the website – I have often come up against paywalls when trying to read about things mentioned in posts by members of the lab. There are some really interesting papers on there; although they’re generally very technical, they’re definitely worth a look if you want to understand what your contributions to R@H are used for, even if you only read the abstract.
I can now recommend ‘Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy’!

If you just want assurance that your contributions are being put to good use, having read a number of these I can confirm that they certainly are!

Some things I did not know about the Bakerlab:

  • There are around 50 people working within the Bakerlab, with a roughly 50-50 split between PhD students and post-doc. researchers.
  • It is a diverse group, with a lot of the researchers coming from outside of the US which in-turn is resulting in those researchers seeding new labs with ties to (and experience from) the Bakerlab.

Current research areas
As mentioned in the intro to this blog, the Rosetta software has two functions: protein modelling and protein design, so unsurprisingly, that’s what the members of the lab are working on! Improvements to the software are developed through trying to improve the software’s ability to work with particular proteins that the individuals within the lab are working on. Put another way, Rosetta is developed through being used in research and improved/expanded where necessary.

Incorporating experimental data
Some members of the lab are working on allowing experimental data to be incorporated into Rosetta, such as NMR data which might show that some amino acids are close together, and which can therefore massively reduce the number of configurations that the folded protein can take.

CASP10 (website)
CASP is a competition held annually for the protein modelling community to pit their techniques against each other. It is performed blind – the entrants have to determine the shape of many different proteins from their amino acid sequence. The structure of all of the proteins in the competition has been determined experimentally, but not yet released to the scientific community, and the competition is therefore a blind search – the labs have no way to tell whether their model is correct until the results are released.

We’re currently in the gap between the competition and the results, although I’m not sure if any CASP work is being carried out at the Bakerlab between now and the meeting in Italy in December. The Bakerlab has done very well at CASP in the past and there are few entrants who enter models of as many of the proteins as the Bakerlab do, but I believe last year’s competition showed that those models which incorporated experimental data (specifically, experimental data from NMR) or referenced similar proteins as starting points did well and so that has been an area of research for the lab.

Using experimental data helps the computer model by massively reducing the search-space that Rosetta has to search – anyone who has played Fold-It will understand how useful it is to know that two side chains should be in contact as that dramatically reduces the potential configurations available for the protein. The results for CASP10 will be released around early December, prior to a meeting on 9th-12th December to discuss the techniques used and their strengths and weaknesses.

That covers everything that’s legible in my notes from our first conversation! Going forwards, I’d like to find out more information about the following subjects:

  • Rosetta Commons – the community that has been seeded by people who have worked in or with the Bakerlab and who now develop the Rosetta software from all over the planet.
  • More information about Nobuyasu’s announcement about their computationally designed de-novo proteins and what this might lead to (that article is here).

I have other questions lined up too, but if there’s something you want me to ask about Rosetta@home or the wider Bakerlab then post in the “Discussion of Rosetta Insight\" thread and we can add it to the list.

I hope this is useful and interesting – let me know what you think.

Rosetta Informational Moderator: Mod.Zilla
ID: 74487 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Rosetta@home Science : Rosetta Insight

©2019 University of Washington