Comments/questions on Rosetta@home journal

Author	Message
adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,140,580 RAC: 0	Message 15818 - Posted: 10 May 2006, 19:27:16 UTC Last modified: 10 May 2006, 19:29:02 UTC A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering... The aim of protein structure prediction is to be able to predict protein structure from an AA sequence. CASP is a "competition" to see which group or agent can best predict the structure of a sequence. This is a compute intensive task. No controversy so far. Is there a danger that a real breakthrough algorithm may be lost, because it's developers received no ongoing financial backing because they did not do well in CASP? Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them? ... asked by someone who has actually reduced his quota at Predictor@Home in order to increase Rosetta, and has to admit to feeling a bit bad about that. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 15818 · Rating: 0 · rate: /

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 15829 - Posted: 10 May 2006, 20:12:39 UTC - in response to Message 15818. Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them? I believe you are asking if the one with the most computing power is going to win CASP, more or less regardless of the merits of their scientific approach. And the answer is no. In fact, if I were CASP, I'd structure the event such that it would take more than that to win. Indeed SOME teams make entries based entirely on human analysis of the AA sequence. And in fact, the Baker team has out-performed all other teams at CASP for some time, and done it without a distributed computing project. A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power. When Baker's work is "done", you'll be able to enter the AA data in, and determine the structure on a single computer is less than an afternoon. But at this point, there are too many unknowns to achieve that. And no algorythm exists to take you to the solution in a straight line. By contributing your time to Rosetta, you are helping prove that their approach to solving the problem is technically superior to approaches that other teams are taking. ...or maybe the other teams prove to have a better approach! Bottom line is that the best predictors will present to the community how they did it! So, everyone wins. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 15829 · Rating: 0 · rate: /

adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,140,580 RAC: 0	Message 15841 - Posted: 10 May 2006, 21:10:12 UTC Last modified: 10 May 2006, 21:17:26 UTC A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power. Forgive me for not being more precise. My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites. Simple analogy, if I had a model which worked well with 100 CPU units, but was beaten by Rosetta if it used 100,000 CPU units of computer time, I might die because I was simply out computed. It need not be a case of my competitor was better, I was beaten, (and the world looses), because the winner had more cpu time. I maybe don't explain this so well, but it is an obvious issue. Note, I raise this issue having read the CASP site in some detail... Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 15841 · Rating: 0 · rate: /

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 15843 - Posted: 10 May 2006, 21:29:15 UTC - in response to Message 15841. Last modified: 10 May 2006, 21:36:02 UTC My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites. I've had the same concern, but this is exactly the beauty of BOINC and DC, that it offers a "democratisation" of scientific research, i.e. a project with minimal funding and no "public relations" power can tap the huge userbase of BOINC donor community, IF it can appeal to them. Until a few years ago, scientific DC projects had to find corporate sponsors like Intel, or Google or IBM, able to push press-releases to the media, so they could get known to the public. Right now there are several life-science DC projects. I find Rosetta@home the one most compatible with my own priorities (see my DC-howto doc in my sig), but I would still be happy to see more coming online and "compete" for CPU cycles and I regularly re-evaluate my resource share. And personally, knowing my character, I would tend to favor a smaller "underdog" project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 15843 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 15876 - Posted: 11 May 2006, 3:23:41 UTC - in response to Message 15818. A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering... The aim of protein structure prediction is to be able to predict protein structure from an AA sequence. CASP is a "competition" to see which group or agent can best predict the structure of a sequence. This is a compute intensive task. No controversy so far. Is there a danger that a real breakthrough algorithm may be lost, because it's developers received no ongoing financial backing because they did not do well in CASP? Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them? ... asked by someone who has actually reduced his quota at Predictor@Home in order to increase Rosetta, and has to admit to feeling a bit bad about that. This is a good question. There are a couple of issues: First, CASP is really an experiment rather than a competition. The purpose is for researchers to learn what the major bottlenecks to progress are, so that the community as a whole can progress as fast as possible. Second, the general feeling in the CASP community, and certainly my feeling up until about a year ago, is that computer power is not limiting the quality of the predictions. This is because with the energy functions used previously, it was always possible to rapidly generate models with lower energies than the native structure, and it was not possible to choose the best among these models based on their energies. In our case, we could generate large sets of models quickly with the ab initio part of rosetta, but we did not have any way to pick out the best models from the sets, so more computer power really would not have made a difference in our previous CASP efforts. The new step forward for us a year ago was that with the improved high resolution refinement protocol, we COULD pick out the best structures made for a number of small proteins. Now, the rest of the research community is probably, and rightly so, somewhat skeptical about whether our model of the energetics is really accurate enough to reliably recognize correct models (especially since the dogma in the field is that energy functions are not accurate enough to do this). Hence the importance of CASP--if we can predict accurate structures in a completely blind test, then everybody (including ourselves!) will be convinced that accurate prediction is possible, and researchers everywhere can build on this work. ID: 15876 · Rating: 1 · rate: /

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 16013 - Posted: 12 May 2006, 7:18:10 UTC If I understand Casp correctly, you're only allowed to turn in 5 guesses.. and if using lowest energy, then you'll send in the blue spot, and two models on the left and the two on the right that are just a little higher energy than the blue dot. Since you can't graph the CASP7 results like this (not having the native structure to compare RMSD on), then there's no way to select the higher energy models 1 angstrom to the left of the 5 low energy models? (** .. X ..) i.e. the two asterixes) Here we have a Casp6 entry.. where the lowest RMSD is around 5.2 Angstroms, and the lowest energy models have an RMSD of around 7.0 and 7.1 Angstroms. (eyeballed..) Looking at the Casp6 target model page that is linked for t198, I see 3 results with the name Baker attached to them. The best one is labeled just "Baker" and around 65% of the atoms in the model are less than 5 Angstroms away from where they are in the native structure. Two more are labeled "Baker-Robetta" and only around 15% of the atoms in those models are less than 5 Angstroms away from where they are in the native structure. How much better is our current best low energy Casp6 model than the ones that were turned in by your lab for Casp 6? And.. with a structure this big what is the RMSD at which the model becomes usuable? 2A, 3A, 4Angstroms, etc. How close are we to being able to create models that will be used for something other than a picture book with labels that read.. "T250 should look like this.." ? ID: 16013 · Rating: 0 · rate: /

hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0	Message 16014 - Posted: 12 May 2006, 7:35:30 UTC Now that casp7 has started I realise that this may not be the appropriate time but... After watching the video I have a much better understanding of the problem at hand and would highly recommend watching it to everybody (it's big, the audio itself (from my poor memory) is about 8 meg and isn't really enough by itself (I tried), about 120 meg for the small video version so dial up/capped users be warned). I read some time ago that you had a global optimisation problem, not being current in such things I assumed that you meant that you needed to optimise global variables, which I must admit I thought very strange indeed, having educated myself a bit more I can see now what you mean. I'm unsure what the difference is between heuristic programming and global optimisation is? So I'm going to use them interchangeable that being said I'm sure that you have some heuristics that are better for some types of proteins, I would think that the addition a sort of heuristic abstraction layer may help ie. heuristic_1 //good for protein type 1 heuristic_2 //good for protein type 2 heuristic_3 //good for protein type 3 ... heuristic_n // etc... and weighting each of them for protein family, or sequence similarity to a know structure etc... that updates the weights as the new information comes in. So you would have a heuristic search on a heuristic search so to speak. I would still very much like to know if ran3() is contributing to the clustering, I understand that the most likely and probable reason is your weighting values, but I have run ran2() against ran3() 400000000 times, ran3() hit one number 38 times, and hit 8 numbers 0 times, a trial run at Ralph would be able to clear this up. Just some thoughts to bounce off the Rosetta@Home team. ID: 16014 · Rating: 0 · rate: /

adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,140,580 RAC: 0	Message 16015 - Posted: 12 May 2006, 8:03:07 UTC Just read this in todays journal entry... One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches. ... isn't that a potential trap? Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 16015 · Rating: 0 · rate: /

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 16019 - Posted: 12 May 2006, 10:40:19 UTC - in response to Message 16015. Just read this in todays journal entry... One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches. ... isn't that a potential trap? I would assume if it was, then the prediction would soon fail or divert to something else showing this. Team mauisun.org ID: 16019 · Rating: 0 · rate: /

Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0	Message 16036 - Posted: 12 May 2006, 13:39:01 UTC - in response to Message 16013. If I understand Casp correctly, you're only allowed to turn in 5 guesses.. and if using lowest energy, then you'll send in the blue spot, and two models on the left and the two on the right that are just a little higher energy than the blue dot. Since you can't graph the CASP7 results like this (not having the native structure to compare RMSD on), then there's no way to select the higher energy models 1 angstrom to the left of the 5 low energy models? One could use the following approach. Use the lowest energy structure as a reference and calculate the RMSD to that one. Then send only structures as guess 2 to 5, which have at least an RMSD > x compared to the lowest energy structure. ID: 16036 · Rating: 0 · rate: /

adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,140,580 RAC: 0	Message 16054 - Posted: 12 May 2006, 15:20:49 UTC I would assume if it was, then the prediction would soon fail or divert to something else showing this. Not necessarily. If the structure of the similar protein is known and presumably the lowest energy structure, the unknown sequence may well have a deep energy well thereabouts, but there could be a totally different configuration which is a deeper well. I don't know how likely that is however, just an observation. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 16054 · Rating: 0 · rate: /

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 16055 - Posted: 12 May 2006, 15:21:36 UTC Global optimization vs heuristics: I can tell you they aren't the same thing. But I'm not certain what global optimization is either. I'm assuming it's a mathematical concept of trying to solve a complex equation. Heuristics is basically use of historical statistics. Let's say you want to write a computer program to play chess. If your computer is not powerful enough to analyze ALL of the possible moves that lay ahead, then what you do is cheat. One way to cheat is to only look 5 or 6 moves ahead... you do this because "heuristically" you've found that if your decision still looks like a good one that far down the road, then typically it proves to be a good one through the end of the game... even though you've not looked that far ahead yet. In evaluating what is a "good move" you have to devise some method of "scoring" the current game board. i.e. you need some means of analyzing proposed move 1 and comparing it with proposed move 2. And so another way to cheat is to look at the field of next possible moves (say there are 20), and throw out the worst scoring choices... and hueristics (i.e. your past experiences) will determine how many of the poorly scoring choices to throw out and how many to pursue further (i.e. continue looking forward at the moves that would follow those). Say you determine you can throw out 10 of the choices, you then look forward on only 10 possible moves rather than 20. This cuts the computing time downstream from there in half! As you can see, if your scoring mechanism isn't good, you might throw away some moves that prove themselves to be the BEST possible... 3 moves later in the game. Playing chess is similar in many ways to what Rosetta is doing with atoms and molecules. The key is in the "Rosetta score". If the scoring mechanism is perfect (and it's not), you can dramatically narrow your field of possible "moves" (i.e. rotational possibilities) to pursue further. Some words to Google if you are interested: Game theory, game tree, depth-first-search, breadth-first-search, backtracking, traveling salesman problem, napsack algorythm. So, the team has devised several different approaches to solving protein structures. As you've read in Dr. Baker's posts, they are finding creative ways of combining the approaches, and finding heuristically that these combination approaches are yielding better results. As they devise new approaches, they should expect to find some work better for some situations than others. For example, a given approach or scoring method works great for proteins less then 100 amino acids long, but doesn't seem to work well for a 200AA protein. Also, as they find proteins where their approach fails to produce a viable structure... they look for more new approaches :) Since this is all still a new science, and a blind study where you don't KNOW the right answer, I would expect them to try all of their approaches on each protein to some extent. And they'll have to gauge which is looking to be the most likely to produce the best prediction to bring work out to R@H. You have to look at the protein your are handed and determine first whether it is a "screw" or a "nail" or a "bolt" and then decide whether you should reach for your screwdriver, your hammer or your wrench to work with it. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 16055 · Rating: 1 · rate: /

Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0	Message 16249 - Posted: 14 May 2006, 13:38:37 UTC Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines. Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit? ID: 16249 · Rating: 0 · rate: /

Rollo Send message Joined: 2 Jan 06 Posts: 21 Credit: 106,369 RAC: 0	Message 16252 - Posted: 14 May 2006, 14:45:55 UTC - in response to Message 16249. Last modified: 14 May 2006, 14:47:33 UTC Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines. Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit? Do Rosetta belong to the group 'server prediction' or 'human expert prediction'? If the latter, than it should be called 'human expert prediction (computer assisted)'. ID: 16252 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 16253 - Posted: 14 May 2006, 14:47:37 UTC - in response to Message 16252. Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines. Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit? Do Rosetta belong to the group 'server prediction' or 'human expert prediction'? The latter must be with computer assisted, of course. A definition of what both these terms mean with regard to CLASP7 might be useful. Thanks! Regards, Bob P. ID: 16253 · Rating: 0 · rate: /

eberndl Send message Joined: 17 Sep 05 Posts: 47 Credit: 3,401,361 RAC: 23	Message 16262 - Posted: 14 May 2006, 16:05:01 UTC - in response to Message 16253. Last modified: 14 May 2006, 16:05:17 UTC Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? I'm with Robert on this one... should we be aborting the unfinished 283s?? Questions? Try the Wiki! Take a look inside my brain ID: 16262 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16271 - Posted: 14 May 2006, 19:46:29 UTC - in response to Message 16262. Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? I'm with Robert on this one... should we be aborting the unfinished 283s?? Sorry about the confusion!! each target has two deadlines. the first is for completely automated web servers, which is two days after the target is released. we do have a web server (robetta) but it is not using rosetta@home. the second deadline, which is for the more intensive computing and human assisted calculations depends on the target, and generally is around 3 weeks--these are the calculations running on rosetta@home. at the CASP7 web site you can see the list of targets and their submission deadlines. bottom line--don't abort any work units!! ID: 16271 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16272 - Posted: 14 May 2006, 19:47:20 UTC - in response to Message 16252. Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines. Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit? Do Rosetta belong to the group 'server prediction' or 'human expert prediction'? If the latter, than it should be called 'human expert prediction (computer assisted)'. rosetta@home is in the second category; see my previous post ID: 16272 · Rating: 0 · rate: /

Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0	Message 16278 - Posted: 14 May 2006, 22:38:01 UTC - in response to Message 16271. Today (Friday) we have closed accepting server predictions for the first target, T0283. What should be done with T0283 WUs that are still on our machines? I'm with Robert on this one... should we be aborting the unfinished 283s?? Sorry about the confusion!! each target has two deadlines. the first is for completely automated web servers, which is two days after the target is released. we do have a web server (robetta) but it is not using rosetta@home. the second deadline, which is for the more intensive computing and human assisted calculations depends on the target, and generally is around 3 weeks--these are the calculations running on rosetta@home. at the CASP7 web site you can see the list of targets and their submission deadlines. bottom line--don't abort any work units!! Roger that. Crunching on. :) ID: 16278 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 16279 - Posted: 14 May 2006, 23:44:29 UTC - in response to Message 16271. Last modified: 14 May 2006, 23:45:16 UTC ...bottom line--don't abort any work units!! Can we safely assume for the future, that should we be past deadlines and/or workunits are no longer useful for processing, etc. etc., that the Project people would post an Announcement to abort the work units in question? It seems like it would be in the Project's self-interest to focus volunteer computing resources on those units that will yield a valued result, and not wish to waste volunteer resources/time with computing results that have no use.... Seems sort of self-evident, I know, but some reassurance that this issue will be monitored closely by the Project may reassure the volunteers! :) Regards, Bob P. ID: 16279 · Rating: 0 · rate: /