Rationale behind protein shape prediction projects

Author	Message
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10468 - Posted: 4 Feb 2006, 20:42:03 UTC About the prediction structures, could you comment on the issues raised in the Is HPF (using Rosetta) project totaly worthless? thread at UD's forums: TestPilot: Just wondering. There is a PDB database out there. It contains information about 3D structures of different proteins. So far it accumulated info about >30 000 proteins. In human DNA it is believed to be around 30 000 - 40 000 genes, so it is 30 000 - 40 000 structures we need to know. Currently there more then 5 000 structures deposited a year, and that number grows each year substantially. Check statistic of that database. Even if only half of structures in PDB database belong to human domain, we will know almost all 3D structures in few years from now, and most important ones must be already in there. And those data way more accurate than results of this project. Furthermore, if someone would need structure that not in PDB, he most likely will use newer version of Rosetta. Which should produce more accurate results - Rosetta is under development and quality of predictions grows (from CASP to CASP at least Wink ). So one way or another results of this project would be obsolete and outdated in 2-4 year time. What the point of project? Or, am I missing something? PDB -Protein Data Bank -http://www.rcsb.org/pdb/ Further debate: "They (PDB) stopped taking mathematical models(MM) in their database like 10 years ago, and they deleted those models from database. But they still count MM on stat page - anyway there was not that many MM submitted." That's correct, didn't notice that on the first visit. But still there must be many of the 33000 known proteins solely as MM's counted (just sum up the totally transmitted files per year.) A second point is, that database holds ALL sorts of proteins not only human. "Nope. After real shape of protein was determined, calculated shape would be useless." Again untrue, because of two things: 1) The true shape is determined from crystalline proteins. This shape might differ from its usual state in the liquid solution in the human body, which can only be calculated as far as I know. 2) When you check the deviation of the calculated model you can improve the software for the calculation, thus giving you better more correct models, thus understanding more the kinetics and other effects of protein folding. "To understand that you need to understand how protein folding prediction works. Basically Rosetta(or any other protein folding prediction software) generate thousands (millions, billions - depends of available computer power) possible protein 3D structures. After that they apply "measurement function" to that shapes. The important part of that measurement function is how stable particular structure is. The structures with better score are marked as prediction." As far as I understand how Rosetta works, it uses known kinetic and energetic models on an atomistic scale (or in the real case on an approximative way for whole aminoic acid functional groups). After that an initial "energy field" is created. Then the protein is folded a small amount in all possible directions and the difference of the initial and the resulting "energy field" due to the interactions of the various functional groups of the protein is calculated. The best solution(s) is taken as a new initial field and the process repeats. At some point no improvement over the "overall Gibb's energy" is possible, it is then minimized. This structure(s) is then taken as the calculated protein structure. At least that is classical ab initio approach on structure calculations. Most certainly does Rosetta make some assumptions and makes some things easier, which then is partly corrected by the statistical approach of the project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10468 · Rating: 0 · rate: / Reply Quote

Vanita Send message Joined: 21 Oct 05 Posts: 43 Credit: 0 RAC: 0	Message 10490 - Posted: 5 Feb 2006, 22:32:50 UTC - in response to Message 10468. Last modified: 5 Feb 2006, 22:40:33 UTC Wow, this post brings us a number of excellent and insightful questions. I can't do justice to all of the points raised in full, but will try to address some here. Just wondering. There is a PDB database out there. It contains information about 3D structures of different proteins. So far it accumulated info about >30 000 proteins. In human DNA it is believed to be around 30 000 - 40 000 genes, so it is 30 000 - 40 000 structures we need to know. ... most important ones must be already in there. Many of the structures in the PDB are "repeats" - ie multiple structures of the same protein, with variations on origin (which species it comes from), or point mutations (single amino acid changes which do not alter the 3D structure). So we have very good coverage of a subset of human proteins, but large sets of proteins are still missing from the PDB. These under-represented proteins are slowly being filled in by x-ray crystallographers, but they are difficult to study experimentally, and some may never be crystallized or may always be too big to study using NMR. Also, minor point, but although the number of human genes is approx 30 000, the actual number of proteins (the human "proteome") is larger (due to processes such as alternative splicing). And those data way more accurate than results of this project. Currently experimental structure determination is more accurate than Rosetta, but Rosetta is getting closer. See recent paper by Bradley et al on progress in high resolution modelling. Bradley, P., Misura, K. M., Baker, D. (2005). Toward high-resolution de novo structure prediction for small proteins Science 309, 1868-1871. [Full Text PDF] The level of accuracy obtained for some of the proteins is as accurate as experimental techniques. Furthermore, if someone would need structure that not in PDB, he most likely will use newer version of Rosetta. Which should produce more accurate results - Rosetta is under development and quality of predictions grows ... So one way or another results of this project would be obsolete and outdated in 2-4 year time. What the point of project? That is the point of the project - to improve Rosetta and increase the quality of structure prediction to the level obtained for some proteins in teh Bradley paper, and the level of experimental structure determination. But Rosetta needs to be improved in order to accomplish this goal for all proteins. Again untrue, because of two things: 1) The true shape is determined from crystalline proteins. This shape might differ from its usual state in the liquid solution in the human body, which can only be calculated as far as I know. Again, I am impressed by your insight! However, although it is technically true that crystal artifacts can occur, in practice the crystal structure is almost always a biologically relevant structure, and identical (within experimental error) to the structure determined by solution phase techniques such as NMR. As far as I understand how Rosetta works, it uses known kinetic and energetic models ... Actually, Rosetta looks only at the energy of the final folded state, and not the kinetics of folding. Although there is a correlation between some aspects of structure and folding kinetics (and you can find papers on that on the bakerlab home page) the actual kinetics of folding are not taken into account during structure prediction by Rosetta. ID: 10490 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10685 - Posted: 12 Feb 2006, 2:27:50 UTC So, would it be correct to say that in the short-term (a couple of years), Rosetta software will be useful, as applied via projects like HPF1/2 to determine 3D structures of existing proteins not yet in PDB. But, once the task of determining 3D structures experimentally (Xray, NMR) for most proteins existing in nature (i.e. all except those too big / difficult to study experimentally via NMR) is finished, then Rosetta would basically be a tool for designing "artificial" proteins? as per DARPA project Protein Design Processes: Today what is considered protein design is in reality the redesign of an existing protein. The Protein Design Processes (PDP) Program changes the paradigm by beginning with an understanding of the binding and chemical reaction that is to be expressed; designing an active site that is compatible with the initial, transition, and final state chemistry; and then embedding the resulting structure in a scaffold. To accomplish this, DARPA is investing in the development of new tools in diverse areas such as topology, optimization, the calculation of ab initio potentials, synthetic chemistry, and informatics leading to the ability to design proteins to order. At the end of this program, researchers expect to be able to design a new complex protein, within 24 hours, that will inactivate a pathogenic organism. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10685 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 10792 - Posted: 15 Feb 2006, 22:27:57 UTC Last modified: 15 Feb 2006, 22:34:16 UTC Bump :-) Does anyone know the answer to the question in the previous post? i.e. what happens once all (or almost all) proteins 3D structures are in the PDB? Once all proteins' shape has been solved experimentally, then there will be no need for projects like HPF, right now using Rosetta sw to determine shape mathematically. So at that point in time, the R sw will mostly (only?) be used to design new proteins, right? Any estimate how long it will take for all proteins to get into PDB experimentally? Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 10792 · Rating: 0 · rate: / Reply Quote

Vanita Send message Joined: 21 Oct 05 Posts: 43 Credit: 0 RAC: 0	Message 10800 - Posted: 16 Feb 2006, 5:45:12 UTC - in response to Message 10792. Once all proteins' shape has been solved experimentally, then there will be no need for projects like HPF, right now using Rosetta sw to determine shape mathematically. So at that point in time, the R sw will mostly (only?) be used to design new proteins, right? Any estimate how long it will take for all proteins to get into PDB experimentally? True, if all proteins of interest are solved experimentally, Rosetta will become redundant. But on the other hand, if Rosetta accurately solves all the remaining protein structures, then experimental techniques will become redundant ;-) Actually, it probably makes more sense to fill in the blanks in the PDB computationally, rather than experimentally, because computational structure prediction takes far less time, money and effort than experimental structure determination. Protein design is of course an ongoing effort in Bakerlab and other labs; as you point out, even when the structure of all natural proteins is known, there will still be a use for Rosetta in designing novel proteins with specific functions. Not sure how long it would take to "finish" the PDB experimentally; as mentioned below, it's not clear that all proteins can even be solved experimentally. Hope that helps! ID: 10800 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 10804 - Posted: 16 Feb 2006, 8:25:40 UTC - in response to Message 10800. Last modified: 16 Feb 2006, 8:37:17 UTC So at that point in time, the R sw will mostly (only?) be used to design new proteins, right? The "Research Overview" page also mentions "protein-protein interactions" as one of Rosetta's capabilities which could be put to good use once the PDB has been filled. Assuming that there are something like 10^5 human proteins there would be roughly 10^10/2 potentially interacting protein pairs, not to mention protein-DNA interactions and interactions with other non-protein molecules. Well, I am not sure how many of those 10^10 potentially interacting pairs, based on their shapes and chemical properties actually do interact but generally speaking, I believe the time when scientists will be out of work because nature (or even just biology - which largely consists of protein interactions) has been "finished" won't come any time soon. ;-) ...and thanks again to Vanita for her Science FAQ, which I just came across today (I usually only go to the homepage to look for news updates) ! ID: 10804 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 11372 - Posted: 25 Feb 2006, 6:15:02 UTC - in response to Message 10804. Last modified: 25 Feb 2006, 6:16:41 UTC ID: 11372 · Rating: 0 · rate: / Reply Quote