Comments/questions on Rosetta@home journal

Author	Message
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 12406 - Posted: 21 Mar 2006, 8:15:59 UTC - in response to Message 12343. Hi Hoelderlin. I am indeed thinking about some of the things you mentioned. I agree that it's important to look at distributions of scores. This can help to see if a few of sub-structures (residues, atoms, etc) have really bad scores and others are pretty good, or if sub-scores are pretty even overall. One could imagine that if there are a few really bad scores, the structure has some kind of serious flaw. An example of such a local problem which never happens in real proteins is a crack or hole. We haven't had very good luck picking out holes, and this is one case where breaking scores down by residue or even by atom is helpful. Just last friday I was looking into a hole-detector baded on scores for individual atoms. Take a look here if you are interested. http://www.gs.washington.edu/~wsheffle/boinc/ Hi Will, I had a look at your web page. Intriguing ! Two things I wanted to mention: Could it be that despite having a lower total energy the prediction on the right sits in a shallower local minimum (less energy needed to change the shape), than the more tightly packed structure on the left ? Also, I am not sure how the interaction with the surrounding medium (water) is treated in the energy calculation (I seem to remember something about implicit and explicit solvent models). Would the fact that the hole seems just large enough for individual water molecules (about the size of your 2.4 A probe) to fit through have any effect on the energy calculation ? ID: 12406 · Rating: 0 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 12538 - Posted: 23 Mar 2006, 0:45:57 UTC - in response to Message 12196. Hi Hoelderlin! You also had a good question about the striking gaps in the 1elw plot... we've been running Rosetta with "score filters". If a client gets about halfway through a run and doesn't make it below a certain energy, we stop the run -- we'd rather you use your computer cycles to start another simluation instead of spending time on a simulation that will likely not get a very low energy! This happens again one more time during the simulation; hence the two gaps in the plot. I had a good look at the new 1elw plot. Great ! The obvious question of course is: what causes the horizontal gaps ? Is that because of the peculiar shape of the protein ? ID: 12538 · Rating: 0 · rate: /

will sheffler Send message Joined: 20 Mar 06 Posts: 3 Credit: 0 RAC: 0	Message 12542 - Posted: 23 Mar 2006, 2:00:23 UTC - in response to Message 12406. Hi Will, I had a look at your web page. Intriguing ! Two things I wanted to mention: Could it be that despite having a lower total energy the prediction on the right sits in a shallower local minimum (less energy needed to change the shape), than the more tightly packed structure on the left ? Also, I am not sure how the interaction with the surrounding medium (water) is treated in the energy calculation (I seem to remember something about implicit and explicit solvent models). Would the fact that the hole seems just large enough for individual water molecules (about the size of your 2.4 A probe) to fit through have any effect on the energy calculation ? Thanks for more interesting questions. I'm waiting for a couple of things to run, and this is nice temporary entertainment. I'm not sure about the shape of the minima, but it's interesting to think about. One thing that's really tough about characterizing our fullatom energy landscape (including things like depth and bredth of minima) is its roughness. Since all the atoms are explicitly represented and van-der-waals energy goes up really sharply as atoms get closer together, a tiny change in almost any "direction" (in the sense of the energy landscape) can make a big difference in the energy. There are plenty of samples physically similar to the right hand (bad) one and they vary in energy quite a bit. This could indicate a broad-ish local energy minimum. Unfortunately, there aren't too many better samples with the right topology like the one on the left so we can't compare based on a clustering argument. It might be interesting to examine a "time" series of energy values as the structures are folded and refined -- see if one falls off more sharply than another -- but they might not be comparable. Energy points aren't at all evenly distributed over the folding process, and it would be hard to define exactly what the "time interval" between points in the series would be. We use implicit solvent in all our methods because simulating a whole bunch of water molecules is really really expensive. This can make some aspects of scoring tricky. The biggest impact is on scoring hydrogen bonding and burial of charged atoms/groups, the hydrophomic effect, etc, but it also seems to have an impact on our packing -- holes, van-der-waals forces, etc. Fro holes, our basic method is to look for regions which are buried AND have exposed surface to various sized spheres. I've been using two different criterion for burial: (1) no exposed surface to a water sized (~2.8A diameter) sphere and (2) lots of other atoms within 10A. this makes sure you aren't too close to the surface.. you might imagine a case where a water can't get at an atom but it's really close to the surface so the protein might "breathe" a little and expose it. Another point is that most what's in the core of a protein is hydrophomic, so even if a water could squeze in, it probably wouldn't been energetically/entropically favorable. Here's another, more subtle thing that lack of explicit solvant might cause. Whave a funny effect where atoms in our predictions tend to pack a little to close together. Take a look at the top-left plot in the pdf below. It shows a histogram of distances between hydrophobic groups on the ends of the side chains. black is from a big set of real proteins. red, blue and green are successively more optimized predictions with our fullatom energy function. Note that the more optimized the decoys are the more they tend to pile up the methyls a little closer than in the real proteins. Back to slovation, we think this might have to do with a "finite lattice" type effect. Everything on the inside of the protein is pulling on everything else, so the atoms want to contract. In a real system, this wouldn't matter because there are atoms in all the water around the protein too, so they "pull" out, balancing the intra-protein van-der-waals forces. http://www.gs.washington.edu/~wsheffle/boinc/atype_dist_CH3-CH3_Hapo-Hapo_OH-OOC_OCbb-NH2O.pdf ID: 12542 · Rating: 0 · rate: /

Sean Kiely Send message Joined: 31 Jan 06 Posts: 64 Credit: 43,992 RAC: 0	Message 12577 - Posted: 23 Mar 2006, 17:19:14 UTC Hi Will: Thank you for the interesting details about the issue of implicit vs. explicit solvation. Would there be any possible benefit to modeling explicit solvation (very sparsely) during processing? Maybe once or twice partway through especially promising structures? I wonder if it might keep us from wandering onto paths where subtle solvation issues keep us away from achieving best energy minima? Sort of a "solvation sanity check"? I recognise, of course, that modeling complexity issues or sheer processing costs may make this unworkable! Sean ID: 12577 · Rating: 0 · rate: /

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 12610 - Posted: 24 Mar 2006, 8:44:04 UTC - in response to Message 12542. Thanks for more interesting questions. I'm waiting for a couple of things to run, and this is nice temporary entertainment. Thanks Will for the detailed answers (and also for the new 'teaser' about the distances that don't come out right). I also had a look at 'The Good, the Bad and the Ugly'. ;-) Cool stuff - though I only understood half of it. But it seems to give a good overview of what you guys at the Baker Lab are up to. Also thanks to Rhiju for his responses here and in the 'science news' thread (and of course for the new links on the top prediction page). ID: 12610 · Rating: 0 · rate: /

R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0	Message 12774 - Posted: 28 Mar 2006, 23:45:47 UTC Dr. Baker, Any word on possible radio/tv interviews yet? Thanks. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. ID: 12774 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12780 - Posted: 29 Mar 2006, 3:23:18 UTC - in response to Message 12774. Dr. Baker, Any word on possible radio/tv interviews yet? Thanks. Hi Robert, I haven't heard anything yet. David ID: 12780 · Rating: 0 · rate: /

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 12783 - Posted: 29 Mar 2006, 7:09:27 UTC Dr. Baker: While trying to track down a text version of a video news report on a new drug targetting aids, I ran across the following article. I wondered if your team dealing with finding a vaccine for HIV had taken the points raised in the article into account: http://www.mindfully.org/Health/2006/AIDS-Medical-Corruption1mar06.htm On another note.. I'm looking forward to seeing how the results of your latest test will compare with the patterns we've seen so far. (It'll be a little while before we see a ball flattened against the left side, eh? :) ID: 12783 · Rating: 0 · rate: /

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 12959 - Posted: 2 Apr 2006, 21:09:17 UTC Last modified: 2 Apr 2006, 21:13:13 UTC A question about the "adjustable run-time" of WUs: Supposing 1) we don't care at all about extra Internet traffic and 2) we don't encounter problems when running Rosetta WU longer would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines). I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right? I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly). Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 12959 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12967 - Posted: 3 Apr 2006, 1:35:44 UTC - in response to Message 12959. A question about the "adjustable run-time" of WUs: Supposing 1) we don't care at all about extra Internet traffic and 2) we don't encounter problems when running Rosetta WU longer would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines). I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right? I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly). Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. ID: 12967 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12977 - Posted: 3 Apr 2006, 11:48:58 UTC - in response to Message 12967. Last modified: 3 Apr 2006, 11:49:30 UTC Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. Regards, Bob P. ID: 12977 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12983 - Posted: 3 Apr 2006, 15:46:29 UTC - in response to Message 12977. Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you? Regards, Bob P. ID: 12983 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12985 - Posted: 3 Apr 2006, 15:55:15 UTC - in response to Message 12983. Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you? It is a pretty even tradeoff because while we get the results back more quickly, there is also more network traffic with the shorter WU. ID: 12985 · Rating: 0 · rate: /

Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0	Message 12994 - Posted: 3 Apr 2006, 18:08:05 UTC So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading) Keep the default, or set the settings to an higher number of hours? Member of Dutch Power Cows ID: 12994 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12995 - Posted: 3 Apr 2006, 18:22:11 UTC - in response to Message 12994. Last modified: 3 Apr 2006, 18:23:45 UTC So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading) Keep the default, or set the settings to an higher number of hours? I think it used to be 8 hours before the 1% Bug became so noticeable an issue. Now that the 1% Bug seems to have been solved based on early results with the new client, is 8 hours the best to use? Regards, Bob P. ID: 12995 · Rating: 0 · rate: /

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 13136 - Posted: 6 Apr 2006, 21:37:26 UTC Last modified: 6 Apr 2006, 21:45:45 UTC Question 1: Quoting from the journal (src) Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question. With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place. Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Question 2: I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 13136 · Rating: 0 · rate: /

bruce boytler Send message Joined: 17 Sep 05 Posts: 68 Credit: 3,565,442 RAC: 0	Message 13139 - Posted: 6 Apr 2006, 22:20:47 UTC - in response to Message 13136. [With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.[/quote] Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? The only ones I have heard about were the ones belonging to someone called Food and Housing Services. In the end the Baker Lab has made it clear since September 2005 that it is up to us the "crunchers" to come up with the cpu to solve this pivotal science problem. Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Blue Gene at Lawrence Livermore is used to predict Nuclear Weapons Bomb yeilds. This is good because if this was not done on Blue Gene they would be exploding them for real. Like in the 1950's and 1960's out in the Nevada deseret. Hope this helps a little.........Bye All! ID: 13139 · Rating: 0 · rate: /

TritoneResolver Send message Joined: 28 Nov 05 Posts: 4 Credit: 57,690 RAC: 0	Message 13144 - Posted: 7 Apr 2006, 0:22:37 UTC The website of FoxNews has picked up the story that was published on livescience.com today. See here: Link. Hopefully, this will bring more attention and members to Rosetta! ID: 13144 · Rating: 0 · rate: /

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 13145 - Posted: 7 Apr 2006, 1:26:08 UTC We're off topic, but here's a link to some of what Blue Gene has been up to. I haven't seen the "gene" aspect of the supercomputer brought up much since it's inception and intial press. I'm guessing they went where the money was, rather than for curing most all of the diseases of mankind. Purely a "business decision," you understand. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 13145 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 13151 - Posted: 7 Apr 2006, 4:10:14 UTC - in response to Message 13136. Question 1: Quoting from the journal (src) Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question. With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place. Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Question 2: I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only. Question 1: Public computing resources have been allocated to us to try to solve the structure prediction problem, including 5 million hours on blue gene through the department of energy's INCITE awards in high performance computing. This was the largest award they made this year. http://72.14.203.104/search?q=cache:dkcheEi0bi8J:nccs.gov/news/pr2006/FY2006_INCITE_Award_Factsheet01312006final.pdf. The NSF funded SDSC and NCSA supercomputing centers have also been extremely generous in providing us computing resources when we were really stuck (in the pre rosetta@home days a year ago). however, to put this into perspective, let us assume that there are 20,000 computers crunching full time for rosetta@home. then 5 million hours, one of the largest allocations of public computing power ever, corresponds to only 10 days on rosetta@home. also, since blue gene was built for high connectivity, the individual processors are 5x slower than most of your PCs. so we are using blue gene to test approaches that take advantage of the high node interconnectivity, and during CASP, which is coming up soon, we will use blue gene for larger proteins which require lots of memory--we are already pushing the limits of public distributed computing with the moderate size proteins running now . Question 2: from the science point of view, 2 hour and 8 hour work units are equivalent. however, 8 hour work units cut down on network traffic, and so are more optimal PROVIDED THAT the computer has a relatively low error rate. we are now cautiously increasing the default work unit length from 2 to 4 hours and we will see how this goes. but users who rarely encounter errors should definitely choose the 8 hour work unit option all other things being equal. ID: 13151 · Rating: 0 · rate: /