Questions about protein folding

Author	Message
itayperl Send message Joined: 20 Mar 06 Posts: 5 Credit: 168,038 RAC: 0	Message 21219 - Posted: 26 Jul 2006, 14:21:43 UTC Hello, I read some info pages in this website and other protein-folding-related websites, and I understand that rosetta is trying to fold proteins in random possible shapes, first the secondary structure then the tertiary, using the amino-acid sequence (is there any other data in WUs?), and finds the most stable (i.e. lowest energy) shape. Did I get it right? Now, assuming that I did, I have a few questions: 1) What does "relax" mean? 2) What is RMSD? 3) What does negative energy mean? 4) What is the difference between "Accepted" and "Low Energy"? Thanks in advance for your time, Itay ID: 21219 · Rating: 0 · rate: / Reply Quote

Keith E. Laidig Volunteer moderator Project developer Send message Joined: 1 Jul 05 Posts: 154 Credit: 117,189,961 RAC: 0	Message 21230 - Posted: 26 Jul 2006, 15:47:08 UTC - in response to Message 21219. Last modified: 26 Jul 2006, 15:47:46 UTC Hello, I read some info pages in this website and other protein-folding-related websites, and I understand that rosetta is trying to fold proteins in random possible shapes, first the secondary structure then the tertiary, using the amino-acid sequence (is there any other data in WUs?), and finds the most stable (i.e. lowest energy) shape. Did I get it right? Now, assuming that I did, I have a few questions: 1) What does "relax" mean? The inital searching is done using rigid units to quickly check whether things are reasonable or crazy. If a shape is found to be reasonable then the rigid units are 'relaxed' - think slowly melted - allowing the component pieces to arrange themselves most favorable. 2) What is RMSD? RMSD = Root Mean Square Deviation. See here for the definition. It is a measure of the difference of the shape you're considering from the desired shape. 3) What does negative energy mean? The sign and magnitude of the Rosetta energy is somewhat arbitrary but the general rule of thumb is lower/smaller/more negative is better. So as things get better and better w/ the shape during searching, the energy can go from a positive value to a negative value as things are 'dialed in'. 4) What is the difference between "Accepted" and "Low Energy"? The Monte Carlo searching method used by Rosetta works something like this: - starting from your present situation, make a random change - is the new situation 'better'? - if yes, start your new search from here, the new "Accepted" shape - if no, randomly decide whether we should keep this structure anyway - if yes, then start your search from here (rare event) - if no, then start over from the previous shape (most common) So, the "Accepted" shapes are the steps you take during a particular search that you chose to keep as you search while the "Low Energy" is the best shape you've found in all of your searching on this molecule so far, perhaps over many different searches. Thanks in advance for your time, Itay ID: 21230 · Rating: 0 · rate: / Reply Quote

Frank Send message Joined: 15 May 06 Posts: 8 Credit: 337,852 RAC: 0	Message 21750 - Posted: 3 Aug 2006, 17:08:31 UTC I'm a computer guy, but I've become interested the protein folding problem. I have few questions/issues about RMSD have been bothering me that I haven't found an answer to. First, how do you go about orienting your test structure to the known structure in order to achieve the best/lowest RMSD? Here's an example (hopefully the ASCII art will come out ok). If the known structure is: --* and your predicted structure is: * \| * \| * When you calculate the RMSD, it will be high. But if you rotate either one by 90 degrees, they're identical. The problem must be even harder when you add a third dimention! Surely there must be a standard or accepted way to orient the structures to achieve the minimal RMSD--otherwise, how would you (for example) evaluate one CASP submission vs. another. Secondly, if the known structure is: / V And the prediction is: The RMSD will be high no matter how you orient it, even though the only mistake is a "" instead of a "V". Wouldn't an error calculation based on the difference between the predicted and known _angles_ in the backbone rather than an error function based on the distance be better? ID: 21750 · Rating: 0 · rate: / Reply Quote

Frank Send message Joined: 15 May 06 Posts: 8 Credit: 337,852 RAC: 0	Message 21751 - Posted: 3 Aug 2006, 17:13:42 UTC The ASCII art didn't survive. The first set should be clear anyway, but here's another crack at the second set with the BBCodes: / V ID: 21751 · Rating: 0 · rate: / Reply Quote

Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0	Message 21756 - Posted: 3 Aug 2006, 18:29:57 UTC Hi Frank, the error between your two patterns shown below is not simply "a instead of a v". The four stars to the right are nowhere near each other if you superimpose the structures. RMSD means to overlay both structures exactly (or as close as possible) and then calculating the distance between each atom position in the computed structure and the original structre. It is not only measuring distances from atom to atom but does that in all three dimensions. This way the angular deviations are already included in the RMSD as they contribute to the structural discrepancy: you can have a perfect atom to atom distance in the whole chain (like in your patterns) and still have the whole sturcture drift slowly apart as you proceed from atom to atom in the backbone. You may also have one totally different angle right in the middle (again as in your patterns) and the RMDS goes, well, ploink into the cellar as the second half can by no means be overlayed just nearly. What basically is done to compare the computed structure with the measured one is to compute the best overlay possible in 3D, which is done by optimization I think (I am not very familiar with current methods in that field). This means that all distances of the respective atoms in both structures are minimized as far as possible. Only after that the RMSD is calculated. It is a very significant number if done that way as, like I said, the angular discrepancies influence it directly. The frame of reference in three dimensional space is generated automatically by the minimization of all distances. Regards, Christoph ID: 21756 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 21758 - Posted: 3 Aug 2006, 18:44:44 UTC - in response to Message 21219. Itay I just wanted to add... the reason your RMSD shows a "?" in the graphic, is because the proteins presently being studied are part of CASP7. And dozens of scientific teams are working to devise the best model prediction for how the protein looks. And so the structures of the proteins being studied are not yet known. And so RMSD cannot be computed, because there isn't yet a known structure to compare our prediction with. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 21758 · Rating: 0 · rate: / Reply Quote

soriak Send message Joined: 25 Oct 05 Posts: 102 Credit: 137,632 RAC: 0	Message 21759 - Posted: 3 Aug 2006, 19:02:20 UTC When not crunching for CASP7, doesn't calculating the RSMD use up processing power? If so, maybe this should only be calculated for the models that end up being submitted and not for every accepted energy? ID: 21759 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 21761 - Posted: 3 Aug 2006, 19:07:26 UTC - in response to Message 21759. ...doesn't calculating the RSMD use up processing power? If so, maybe this should only be calculated for the models that end up being submitted and not for every accepted energy? I've had the same thought... basically to calculate RMSD less frequently. There are points in the middle of the models where they want to study how far off they are. And this is part of what gets reported back. But yes, I'm thinking there might be some room to compute RMSD much less frequently, and perhaps even less when graphic is not being displayed. It's possible they've got some simple way to know how a given change will effect RMSD and so it's not as expensive as it sounds. But, I'm thinking it's more likely there is room to save some CPU-time there when we're back to studying known proteins. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 21761 · Rating: 0 · rate: / Reply Quote

Frank Send message Joined: 15 May 06 Posts: 8 Credit: 337,852 RAC: 0	Message 21788 - Posted: 4 Aug 2006, 1:01:51 UTC - in response to Message 21756. Hi Christoph. Thanks for the reply. the error between your two patterns shown below is not simply "a instead of a v". The four stars to the right are nowhere near each other if you superimpose the structures. ... That's kind of my point. Imagine instead of four stars, the two areas are complex domains (DomA and DomB) with hundreds of monomers connected by a short random coil. Imagine that, taken independently, the domains are predicted perfectly but the random coil is not. Overlaying the predicted DomA with the known DomA will move DomB out of alignment and vice versa. The error, as measured RMSD of the _distances_ between atoms may be very high. However, a person looking at the predicted vs. known structures could see the two structures are very close. What I was proposing was using the RMSD of the _bond angles_ in the known vs. predicted backbone rather than the RMSD of the distance between known/predicted atoms. In this case, all the angles in the backbone are perfect except in the random coil, so the error measured this way would be small. Further, since you can measure each angle along the chain independently of the larger structure, you don't have to try to overlay the structure multiple ways to minimize the RMSD error calculation (i.e. the error calculation function is computationally a lot cheaper do you can do more work in the same amount of time). ID: 21788 · Rating: 0 · rate: / Reply Quote

Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0	Message 21848 - Posted: 4 Aug 2006, 16:47:09 UTC ID: 21848 · Rating: 0 · rate: / Reply Quote

Frank Send message Joined: 15 May 06 Posts: 8 Credit: 337,852 RAC: 0	Message 21863 - Posted: 4 Aug 2006, 20:00:55 UTC - in response to Message 21848. okay, now I think I understand your point, looks like I didn't get it first time. Unfortunately that is also the reason why I did not get the point of your We're getting closer, but there's one important part I think I still didn't express well enough. I'll take another crack at it then I'll read more and talk less. :) This means that the only way to "bend" a protein is to make rotations around the bonds. All those different structures possible are only achieved by such rotations, never by bending any part directly (I wish I had a model to show that grumble). Those rotations are what I meant when I said "bond angles". Put another way, a structure is typically specified in a PDB-type format that gives 3D co-ordinates for each atom in a protein. However, since the protein is a polymer of amino acids and the only thing that varies as you walk from one end of the other are the side chains (which you know from the amino acid name) and the rotations (phi / psi), you should also be able to describe the structure that way. For example, if I know the structure is this (whith VAL as the C-terminus): VAL(phi=100/psi=20) - ALA(phi=300/psi=0) - GLY(phi=0/psi=50) I should be able to generate a PDB-type file with all the atoms. And if I had such a PDB file, I would be able to figure out the phi/psi for each of the monomers. If that's true, they two formats are equivalent (the PDB way just contains redundant information). Now, the conventional RMSD of computing the error would involve orienting the structures to the best possible fit, then calculating distances for each atom. Looking at a random PDB file, it looks like VAL has 7 atoms, ALA has 5, and GLY has 4 for a total of 16, each of which has an X,Y,Z value. To do the RMSD calculation, you'll have to figure out the distance between each of these atoms from known to predicted structure. What I'm suggesting instead is to use the difference in the phi/psi angles between known and predicted structure for the error calculation. That is, two simple subtractions per amino acid. It has the advantage of (1) fewer calculations to do and (2) no need to orient the structures to each other ahead of time. ... I still see your point and I agree that it is possible to have a terrible RMSD by missing the fact that two domains are correctly modeled but do not match for sterical reasons. But I do not think that is much of an issue as it happens rather seldom and many of these problems have already been eliminated. And, after all, the angles involved are not affected by that, as I explained. Maybe this is all a moot point for real life proteins (or a solution in search of a problem :)--it just bothers me that its a theoretical problem at least. Thanks again for taking the time to read/reply. ID: 21863 · Rating: 1 · rate: / Reply Quote

Vanita Send message Joined: 21 Oct 05 Posts: 43 Credit: 0 RAC: 0	Message 21868 - Posted: 4 Aug 2006, 21:08:40 UTC - in response to Message 21863. Frank, you've hit on something that is in fact a big bone of contention among CASP participants (so I've heard, at least, I've never participated myself). RMSD is generally agreed to be a suboptimal measure for determining how close to correct a prediction really is. One example where it would fail is the one you gave below, of two domains with a flexible linker between them, and which in solution may adopt different internal orientations. Another example is that RMSD indiscriminately rewards more compact structures. As far as I know, the debate on the "best" measure of correctness occurs at every CASP, and there has been no consensus so far. ID: 21868 · Rating: 1 · rate: / Reply Quote