difference between robetta and crunchers

Author	Message
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 59861 - Posted: 27 Feb 2009, 20:02:03 UTC - in response to Message 59853. i know that without programs like rosetta a computer cant predict protein structures... and paul showed quite good that both the testing and robetta is important. but you didnt answer my last questions: if minirosetta is better, why is it not always used in rosetta@home? and why does robetta use rosetta although minirosetta is better? that should be the last questions i ask, thanks for your patience Minirosetta is not always used in R@h because some of the methods in rosetta have not been ported over yet. Regarding the second question, minirosetta is a restructured version of rosetta that allows scientists to develop new protocols and experiments more easily. It's better in that sense. The core of minirosetta is very much similar to rosetta. For the ab initio structure prediction protocol used in robetta, minirosetta is basically the same as rosetta so it is not being used yet. There are however current efforts to improve the scoring function in minirosetta but it is for a method that involves much more computing. For comparative modeling there are currently millions of R@H minirosetta jobs to run and analyze for testing the various methods that where successful in CASP8. The aim is to identify the best methods and automate them for robetta which is not a small task. Also, applying any new and improved method to robetta would still depend on available computing resources. ID: 59861 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 59862 - Posted: 27 Feb 2009, 20:06:47 UTC - in response to Message 59860. What is a "process hour"? An hour of CPU time on an array of CPUs? Or each CPU? How many "process hours" is Rosetta@home using now? it's an NCSA unit and is basically "one CPU-hour (wallclock time) on a given platform." can't really translate that to R@h. ID: 59862 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 59863 - Posted: 27 Feb 2009, 20:09:55 UTC Lots of interesting stuff here on NCSA Including the video on Blue Waters (click link on the right, be patient!) Rosetta Moderator: Mod.Sense ID: 59863 · Rating: 0 · rate: / Reply Quote

Jaykay Send message Joined: 13 Nov 08 Posts: 29 Credit: 1,743,205 RAC: 0	Message 59870 - Posted: 28 Feb 2009, 10:44:30 UTC Last modified: 28 Feb 2009, 10:52:06 UTC many many thanks to all of you! i think that was it, i'm really running out of questions.... maybe "wheres the difference between rosetta and minirosetta?" and "what is robetta and what does it do?" candidates for any of the FAqs... edit: one more, sorry... is it right that the results of rosetta@home are not fully.... (sorry the right word is missing) utilised? i mean if the suppose of rosetta@home is to improve rosetta and rosetta cant be used so much as it could be used because of the small robetta... a second try: would rosetta@home be more useful if robetta would be a better supercomputer? ID: 59870 · Rating: 0 · rate: / Reply Quote

Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0	Message 59876 - Posted: 28 Feb 2009, 16:21:47 UTC - in response to Message 59870. a second try: would rosetta@home be more useful if robetta would be a better supercomputer? While Robetta and all other projects running the Rosetta software would benefit from more computer power (more computer power = more work done), any improvements to the Rosetta software should help all projects no matter how fast their hardware is. As I understand it there are 4 main ways that Rosetta@home can improve the Rosetta software: 1. Refine the Rosetta algorithms so that the software can produce the same quality of results with less computational power. This would help all projects using Rosetta as they will be able to complete more tasks in less time. 2. Improve the quality of predictions produced by the Rosetta software; this may take more computational power per task but provide better results. This could mean that Robetta and other projects take longer to complete a task but give a better answer. 3. Allow the Rosetta software to make different types of predictions, such as the protein docking predictions mentioned earlier in this thread. Some existing projects may decide that they don't want to get involved in protein docking and just stick to ab initio prediction while other labs working on protein docking may decide to set up a new project using the Rosetta software. 4. Improve the usability of the Rosetta software so it is easier for scientists to set up new experiments. This wouldn't affect the super computer side of the project, but would reduce the amount of time a scientist needs to spend in setting up a project before running it on a super computer. ID: 59876 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 59879 - Posted: 28 Feb 2009, 18:56:20 UTC Last modified: 1 Mar 2009, 16:25:29 UTC What makes Rosetta useful is it's ability to produce a prediction that is close enough to the reality of the proteins structure that it allows a drug or vaccine to be developed. Think of launching a nuclear missle at a distant star. When the missle arrives at it's destination, you have a large explosion, and thus some room for error in your course. So, even if the missle does not plunge directly into the middle of the star, there are degrees to which you would still say your course was accurate enough. With proteins, it is a little more 3-dimensional. If your are making a drug that plans to bind at a specific area of the protein's surface, and your prediction of that local area of the surface is close enough, then the drug will work. And even if your prediction of the other side is off, the prediction was still useful. Think of the protein's surface as being like the surface of the earth. If the drug is targetting Mount Everest, then accuracy about the terrain in and around Antartica is not as important. The problem is, you really do not know which area of the surface you have predicted accurately, and which areas not. This is why greater accuracy is always important. And yet there's also the "close enough". "Close enough" still means you have to go to all the effort of developing the drug and testing it, and your odds of success in the end are directly related to the accuracy of the predictions it was based upon. So, more accurate predictions means less fruitless, and expensive, drug studies. At this point, I believe I am correct to say that what is needed to make Rosetta more useful to the greater scientific community is greater accuracy. I believe if BakerLab could devise methods that happened to take 4 times as long to compute, but were significantly higher accuracy, they would do it. And that is what Rosetta@home is working to do. Not necessarily make it take 4 times longer to run, but find ways to attain greater accuracy. Note, for example, the recent increase in memory requirements. It was found that greater accuracy could be attained with a program that requires more memory to run well. Also, the new docking and other techniques require more memory as well. (DK will keep me honest. But I believe that would be the main reason for making such a change in minimum memory reccommendation). Rosetta Moderator: Mod.Sense ID: 59879 · Rating: 0 · rate: / Reply Quote

Jaykay Send message Joined: 13 Nov 08 Posts: 29 Credit: 1,743,205 RAC: 0	Message 59886 - Posted: 1 Mar 2009, 9:52:49 UTC thanks to all, i have no more questions (at last) :) ID: 59886 · Rating: 0 · rate: / Reply Quote

cenit Send message Joined: 1 Apr 07 Posts: 13 Credit: 1,630,287 RAC: 0	Message 59910 - Posted: 2 Mar 2009, 8:01:21 UTC I have one more question, that I asked on another thread unreplied: how can you optimize Rosetta for supercomputers on BOINC, which by definition doesn't permit "p2p" communications? In other words, on a supercomputer you can define "interactions" between software threads running on different processors, on your BOINC client you can't. I'm sure these "interactions" could help really a lot your algoritm, but you're unable to test it on BOINC platform unless using local SMP client. Am I right? Are you planning on distributing a SMP client? Or it isn't so important at this point of the development? Thanks ID: 59910 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 59928 - Posted: 2 Mar 2009, 17:58:01 UTC cenit, you are asking a very specific low-level question that I do not know the answer to. But let me attempt to address it with a broad, high-level response. Dr. Baker has said in the past that our contributions via Rosetta@home are far more significant for the continued research efforts of his lab then all of their allocations of time on any supercomputers. The main reason for this is that the allocations are constantly expiring and you cross your fingers that you will get another one. Whereas Rosetta@home is consistent, and reliable, and the total number of TFLOPs exceeds the granted supercomputer time in a year. And as such, writing a bunch of new code to attempt to interact between tasks in progress just for those few times you do get to run in an environment where you could use them seems like a lot of work with little payback. I'm not certain what is behind your assumption that such interactions between active models would be helpful in solving the protein structure any faster or more efficiently. As I'm sure you are aware (from the detail in your question), you can't just take a single activity and throw it on a supercomputer and have it done in hours rather then months. The program has to be coded in a mannar that makes it able to take advantage of the thousands of CPUs available in a supercomputer. Each CPU on it's own is not that much different then a home computer. Rosetta Moderator: Mod.Sense ID: 59928 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 59929 - Posted: 2 Mar 2009, 18:05:38 UTC As described in the research overview... ...employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations with the Monte Carlo minimization procedure and physical model used in our high-resolution structure prediction work. You can learn more about Monte Carlo minimization in the wiki. Using this approach, each thread of execution can operate independantly. Rosetta Moderator: Mod.Sense ID: 59929 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 59932 - Posted: 3 Mar 2009, 7:17:19 UTC - in response to Message 59910. I have one more question, that I asked on another thread unreplied: how can you optimize Rosetta for supercomputers on BOINC, which by definition doesn't permit "p2p" communications? In other words, on a supercomputer you can define "interactions" between software threads running on different processors, on your BOINC client you can't. I'm sure these "interactions" could help really a lot your algoritm, but you're unable to test it on BOINC platform unless using local SMP client. Am I right? Are you planning on distributing a SMP client? Or it isn't so important at this point of the development? Thanks If you look at this article the first couple paragraphs give a very good description of the difference in two classes of supercomputer. BOINC is the grid type which is loosely coupled which has the effect of increasing the communication cost between nodes (your computer and mine). Some problems are not sensitive to this as the idea is to run many simulations and there is little to no communication needed between simulations, what I mean is that one simulation has no dependency on any other. This is not an exclusive, GPU Grid and Milky Way use a serialized method where my task, upon successful completion is the root of the next generation of tasks created, and the same is true of your successful tasks. But, once created the tasks are independent of each other. In a single room cluster the interconnect may be of various types including hypercubes, NUMA, and other specialized networks. These architectures then allow the simulations to pass information between themselves as required. In these models, such as a weather simulation, where each "cell" of the weather system is computed individually on a node, and then communicates the required "influence" information to adjacent cells. But, there are also problems that are not so easily structured because of the mass of data and the influence of one data element on others. In these cases vector processing comes to the rescue. Think CUDA ... Fundamentally, the Rosetta team has elected to use an architecture and technique where there is no need for the nodes to communicate. ID: 59932 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 592	Message 59934 - Posted: 3 Mar 2009, 9:18:26 UTC - in response to Message 59928. Last modified: 3 Mar 2009, 9:20:20 UTC cenit, you are asking a very specific low-level question that I do not know the answer to. But let me attempt to address it with a broad, high-level response. Dr. Baker has said in the past that our contributions via Rosetta@home are far more significant for the continued research efforts of his lab then all of their allocations of time on any supercomputers. The main reason for this is that the allocations are constantly expiring and you cross your fingers that you will get another one. Whereas Rosetta@home is consistent, and reliable, and the total number of TFLOPs exceeds the granted supercomputer time in a year. And as such, writing a bunch of new code to attempt to interact between tasks in progress just for those few times you do get to run in an environment where you could use them seems like a lot of work with little payback. I'm not certain what is behind your assumption that such interactions between active models would be helpful in solving the protein structure any faster or more efficiently. As I'm sure you are aware (from the detail in your question), you can't just take a single activity and throw it on a supercomputer and have it done in hours rather then months. The program has to be coded in a mannar that makes it able to take advantage of the thousands of CPUs available in a supercomputer. Each CPU on it's own is not that much different then a home computer. There was a post quite a while back where one of the team had said they'd had to spend a lot of time on the code to modify it to make use of some supercomputer time they'd got (not sure what computer it was but recall it was Power rather than x86 based) because they were making use of the available inter-thread communications... assuming i remember it right! ID: 59934 · Rating: 0 · rate: / Reply Quote

bruce boytler Send message Joined: 17 Sep 05 Posts: 68 Credit: 3,565,442 RAC: 0	Message 59944 - Posted: 3 Mar 2009, 13:25:00 UTC - in response to Message 59910. I have one more question, that I asked on another thread unreplied: how can you optimize Rosetta for supercomputers on BOINC, which by definition doesn't permit "p2p" communications? In other words, on a supercomputer you can define "interactions" between software threads running on different processors, on your BOINC client you can't. I'm sure these "interactions" could help really a lot your algoritm, but you're unable to test it on BOINC platform unless using local SMP client. Am I right? Are you planning on distributing a SMP client? Or it isn't so important at this point of the development? Thanks Cenit....You are trying to compare apples to oranges with your question. Using SMP in a multi-core computer would indeed solve a workunit very fast. Much faster than applying a single core to a workunit. SMP is actually the prefered method of crunching a workunit at folding@home. It is thier belif that using muliple cores on just one workunit is more efficent than applying say 4 workunits to 4 cores. BOINC does not use SMP. BOINC is the middleware that runs rosseta@home. It will apply say the 4 cores to 4 workunits. Rosseta@home has nothing to do with how the proccessors crunch the workunit. ID: 59944 · Rating: 0 · rate: / Reply Quote

cenit Send message Joined: 1 Apr 07 Posts: 13 Credit: 1,630,287 RAC: 0	Message 59945 - Posted: 3 Mar 2009, 14:48:37 UTC - in response to Message 59944. Last modified: 3 Mar 2009, 14:50:12 UTC Cenit....You are trying to compare apples to oranges with your question. Using SMP in a multi-core computer would indeed solve a workunit very fast. Much faster than applying a single core to a workunit. SMP is actually the prefered method of crunching a workunit at folding@home. It is thier belif that using muliple cores on just one workunit is more efficent than applying say 4 workunits to 4 cores. BOINC does not use SMP. BOINC is the middleware that runs rosseta@home. It will apply say the 4 cores to 4 workunits. Rosseta@home has nothing to do with how the proccessors crunch the workunit. I think you completely missed my question. SMP is an architecture for multithreading, one of the way to parallelize work on many CPU; it's supported on boinc and, most of all, I don't know what F@H involves here. Fundamentally, the Rosetta team has elected to use an architecture and technique where there is no need for the nodes to communicate. I'm not a protein expert, but as dcdc says, some time ago there was a post about it. Anyway, as just said, I'm not an expert but I'm sure that thread "interactions", even if more complex, would benefit rosetta. Isn't this (rosetta works manipulating many objects) one of the preferred job for multithreading? ID: 59945 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0	Message 59977 - Posted: 4 Mar 2009, 21:17:34 UTC The Rosetta code is not organized to take advantage or parallelization ... or it would be a natural for CUDA/GPU processing. BUt, multi-threading can and does take place in the BOINC universe in that by running 4 tasks on 4 cores allows the OS to dispatch threads as needed. On my i7 where there are 4 physical cores and 8 virtual CPUs I could see the advantages when running GPU Grid when they were not using the CPU efficiently. For a loss of 7% per GPU core I was running 4 GPU tasks ... a total loss of 21% of total processor power. On my 4 core I lost 20% just to support one GPU core ... much less efficient use of the processing power most likely because of the more effective thread management of the i7 CPU ... Each simulation of RaH is stand alone and does not require the information from any other simulation to progress. I have never run FaH so don't know how or what they are doing ... but ... different approaches to this general problem area can be seen with the plentitude of projects doing this general area of work ... ID: 59977 · Rating: 0 · rate: / Reply Quote