Comments/questions on Rosetta@home journal

Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 15710 - Posted: 9 May 2006, 12:27:44 UTC - in response to Message 15708.  
Last modified: 9 May 2006, 12:28:35 UTC

You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all.

I agree! I'll bet the vast majority of the computing power comes from the "all others". Don't forget the little guy in your efforts to recruit the big guys! There should be some mechanism by which everyone feels they have a shot at some sort of recognition for their efforts. :)
Regards,
Bob P.
ID: 15710 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TioSuper

Send message
Joined: 2 May 06
Posts: 17
Credit: 164
RAC: 0
Message 15711 - Posted: 9 May 2006, 12:28:17 UTC - in response to Message 15708.  

You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all. Perhaps look for increases in models/month; and weigh those that have added an extra machine or 5 a little higher than those that increase Rosetta's Boinc share from 10% to 100%. (i.e. grab a couple of both). Perhaps picking out one at random so a non teamed member that runs a second cpu for all of Casp7 has a chance of being picked.





Hey something like the NCAA: A Competition within the big Teams, The Middle teams and the small teams and a competition for the independents.

But lets make this clear: the main goal is to motivate production, the competition cannot be allowed to degenerate into name calling and all the worst things that competition brings out.

ID: 15711 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15721 - Posted: 9 May 2006, 15:28:52 UTC - in response to Message 15710.  

You might also want to think up a way to encourage a production increase in the smaller teams, and the vast horde of those who don't belong to a team at all.

I agree! I'll bet the vast majority of the computing power comes from the "all others". Don't forget the little guy in your efforts to recruit the big guys! There should be some mechanism by which everyone feels they have a shot at some sort of recognition for their efforts. :)



Hi Bob, any suggestions? we can also have an award for the lowest energy model for each target (we can't do this for the low rmsd model as we are now, because we won't know the true structure!).

Rom is starting now to look at the credits issue--I'll direct him to the discussion here.
ID: 15721 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 15724 - Posted: 9 May 2006, 16:10:34 UTC - in response to Message 15721.  

Hi Bob, any suggestions? we can also have an award for the lowest energy model for each target (we can't do this for the low rmsd model as we are now, because we won't know the true structure!).

Sounds like a good idea!

Thinking out loud, I thought in addition perhaps a lottery would be nice, with each "lottery ticket" a structure prediction, or something like that. You would pick winners at random from the pool of structure predictions (whether correct or not is not the relevant issue, the purpose of this lottery is to obtain as many structure predictions as possible). Thus, like with lottery tickets, the more structure prediction "entries" one has, the greater chance one has of winning!

Others may have suggestions as well....

Regards,
Bob P.
ID: 15724 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15746 - Posted: 10 May 2006, 2:45:00 UTC - in response to Message 15594.  
Last modified: 10 May 2006, 2:45:44 UTC

If anyone else remembers the article, it talked about an entreprenure that's working on methods of doing the x-ray crystallography FASTER. And how each protein can cost $100,000s of USD.

OK, I finally found it. I guess it was 10s of thousands, not 100s... either way mulitply it by a billion proteins and it's outta MY budget ;)

This article in Wired discusses what some others are doing in the pursuit of protein folding.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15746 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15750 - Posted: 10 May 2006, 4:06:40 UTC - in response to Message 15739.  



To Dr. Baker: Don't they release the native structure of the protein we're working on - sometime after we're no longer allowed to submit data for that protein to Casp? i.e. Can't we have an RMSD result before the year end comparison data is released? (Just to sate our curiosity?)


Yes, we will probably get the solutions to the prediction problems, the native structures, in september or october, and we can definitely post the winners for each target retrospectively.
ID: 15750 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,678,748
RAC: 1,060
Message 15818 - Posted: 10 May 2006, 19:27:16 UTC
Last modified: 10 May 2006, 19:29:02 UTC

A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering...

The aim of protein structure prediction is to be able to predict protein structure from an AA sequence. CASP is a "competition" to see which group or agent can best predict the structure of a sequence. This is a compute intensive task. No controversy so far.

Is there a danger that a real breakthrough algorithm may be lost, because it's developers received no ongoing financial backing because they did not do well in CASP?

Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them?

... asked by someone who has actually reduced his quota at Predictor@Home in order to increase Rosetta, and has to admit to feeling a bit bad about that.


Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 15818 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15829 - Posted: 10 May 2006, 20:12:39 UTC - in response to Message 15818.  

Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them?

I believe you are asking if the one with the most computing power is going to win CASP, more or less regardless of the merits of their scientific approach. And the answer is no. In fact, if I were CASP, I'd structure the event such that it would take more than that to win. Indeed SOME teams make entries based entirely on human analysis of the AA sequence. And in fact, the Baker team has out-performed all other teams at CASP for some time, and done it without a distributed computing project.

A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power. When Baker's work is "done", you'll be able to enter the AA data in, and determine the structure on a single computer is less than an afternoon. But at this point, there are too many unknowns to achieve that. And no algorythm exists to take you to the solution in a straight line.

By contributing your time to Rosetta, you are helping prove that their approach to solving the problem is technically superior to approaches that other teams are taking. ...or maybe the other teams prove to have a better approach! Bottom line is that the best predictors will present to the community how they did it! So, everyone wins.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15829 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,678,748
RAC: 1,060
Message 15841 - Posted: 10 May 2006, 21:10:12 UTC
Last modified: 10 May 2006, 21:17:26 UTC

A real breakthrough algorythm would be one that can produce a more accurate model and do it with less computing power.

Forgive me for not being more precise.

My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites.

Simple analogy, if I had a model which worked well with 100 CPU units, but was beaten by Rosetta if it used 100,000 CPU units of computer time, I might die because I was simply out computed. It need not be a case of my competitor was better, I was beaten, (and the world looses), because the winner had more cpu time.

I maybe don't explain this so well, but it is an obvious issue.

Note, I raise this issue having read the CASP site in some detail...

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 15841 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 15843 - Posted: 10 May 2006, 21:29:15 UTC - in response to Message 15841.  
Last modified: 10 May 2006, 21:36:02 UTC

My concern was that a small funded research effort may offer a better, in terms of less compute time needed modelling tool, but would never appear because although it presented a small footprint, it was "swamped" by the massive brute force crunching avaiable to popular DC projects or well funded commercial sites.


I've had the same concern, but this is exactly the beauty of BOINC and DC, that it offers a "democratisation" of scientific research, i.e. a project with minimal funding and no "public relations" power can tap the huge userbase of BOINC donor community, IF it can appeal to them.

Until a few years ago, scientific DC projects had to find corporate sponsors like Intel, or Google or IBM, able to push press-releases to the media, so they could get known to the public.

Right now there are several life-science DC projects. I find Rosetta@home the one most compatible with my own priorities (see my DC-howto doc in my sig), but I would still be happy to see more coming online and "compete" for CPU cycles and I regularly re-evaluate my resource share. And personally, knowing my character, I would tend to favor a smaller "underdog" project.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 15843 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15876 - Posted: 11 May 2006, 3:23:41 UTC - in response to Message 15818.  

A somewhat philosophical question, asked by someone who has in the last week, doubled his Rosetta quota, but is wondering...

The aim of protein structure prediction is to be able to predict protein structure from an AA sequence. CASP is a "competition" to see which group or agent can best predict the structure of a sequence. This is a compute intensive task. No controversy so far.

Is there a danger that a real breakthrough algorithm may be lost, because it's developers received no ongoing financial backing because they did not do well in CASP?

Is it possible that inferior methods may get funded and progress simply because they had more computer power available to them?

... asked by someone who has actually reduced his quota at Predictor@Home in order to increase Rosetta, and has to admit to feeling a bit bad about that.



This is a good question. There are a couple of issues:

First, CASP is really an experiment rather than a competition. The purpose is for researchers to learn what the major bottlenecks to progress are, so that the community as a whole can progress as fast as possible.

Second, the general feeling in the CASP community, and certainly my feeling up until about a year ago, is that computer power is not limiting the quality of the predictions. This is because with the energy functions used previously, it was always possible to rapidly generate models with lower energies than the native structure, and it was not possible to choose the best among these models based on their energies. In our case, we could generate large sets of models quickly with the ab initio part of rosetta, but we did not have any way to pick out the best models from the sets, so more computer power really would not have made a difference in our previous CASP efforts. The new step forward for us a year ago was that with the improved high resolution refinement protocol, we COULD pick out the best structures made for a number of small proteins. Now, the rest of the research community is probably, and rightly so, somewhat skeptical about whether our model of the energetics is really accurate enough to reliably recognize correct models (especially since the dogma in the field is that energy functions are not accurate enough to do this). Hence the importance of CASP--if we can predict accurate structures in a completely blind test, then everybody (including ourselves!) will be convinced that accurate prediction is possible, and researchers everywhere can build on this work.



ID: 15876 · Rating: 1 · rate: Rate + / Rate - Report as offensive
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 16013 - Posted: 12 May 2006, 7:18:10 UTC


If I understand Casp correctly, you're only allowed to turn in 5 guesses.. and if using lowest energy, then you'll send in the blue spot, and two models on the left and the two on the right that are just a little higher energy than the blue dot. Since you can't graph the CASP7 results like this (not having the native structure to compare RMSD on), then there's no way to select the higher energy models 1 angstrom to the left of the 5 low energy models?

(** .. X ..) i.e. the two asterixes)


Here we have a Casp6 entry.. where the lowest RMSD is around 5.2 Angstroms, and the lowest energy models have an RMSD of around 7.0 and 7.1 Angstroms. (eyeballed..) Looking at the Casp6 target model page that is linked for t198, I see 3 results with the name Baker attached to them. The best one is labeled just "Baker" and around 65% of the atoms in the model are less than 5 Angstroms away from where they are in the native structure. Two more are labeled "Baker-Robetta" and only around 15% of the atoms in those models are less than 5 Angstroms away from where they are in the native structure.

How much better is our current best low energy Casp6 model than the ones that were turned in by your lab for Casp 6? And.. with a structure this big what is the RMSD at which the model becomes usuable? 2A, 3A, 4Angstroms, etc. How close are we to being able to create models that will be used for something other than a picture book with labels that read.. "T250 should look like this.." ?

ID: 16013 · Rating: 0 · rate: Rate + / Rate - Report as offensive
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 16014 - Posted: 12 May 2006, 7:35:30 UTC

Now that casp7 has started I realise that this may not be the appropriate time but...

After watching the video I have a much better understanding of the problem at hand and would highly recommend watching it to everybody (it's big, the audio itself (from my poor memory) is about 8 meg and isn't really enough by itself (I tried), about 120 meg for the small video version so dial up/capped users be warned).

I read some time ago that you had a global optimisation problem, not being current in such things I assumed that you meant that you needed to optimise global variables, which I must admit I thought very strange indeed, having educated myself a bit more I can see now what you mean.

I'm unsure what the difference is between heuristic programming and global optimisation is? So I'm going to use them interchangeable that being said I'm sure that you have some heuristics that are better for some types of proteins, I would think that the addition a sort of heuristic abstraction layer may help ie.

heuristic_1 //good for protein type 1
heuristic_2 //good for protein type 2
heuristic_3 //good for protein type 3
...
heuristic_n // etc...

and weighting each of them for protein family, or sequence similarity to a know structure etc... that updates the weights as the new information comes in.

So you would have a heuristic search on a heuristic search so to speak.

I would still very much like to know if ran3() is contributing to the clustering, I understand that the most likely and probable reason is your weighting values, but I have run ran2() against ran3() 400000000 times, ran3() hit one number 38 times, and hit 8 numbers 0 times, a trial run at Ralph would be able to clear this up.

Just some thoughts to bounce off the Rosetta@Home team.
ID: 16014 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,678,748
RAC: 1,060
Message 16015 - Posted: 12 May 2006, 8:03:07 UTC

Just read this in todays journal entry...
One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches.

... isn't that a potential trap?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 16015 · Rating: 0 · rate: Rate + / Rate - Report as offensive
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 16019 - Posted: 12 May 2006, 10:40:19 UTC - in response to Message 16015.  

Just read this in todays journal entry...
One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches.

... isn't that a potential trap?


I would assume if it was, then the prediction would soon fail or divert to something else showing this.
Team mauisun.org
ID: 16019 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Rollo

Send message
Joined: 2 Jan 06
Posts: 21
Credit: 106,369
RAC: 0
Message 16036 - Posted: 12 May 2006, 13:39:01 UTC - in response to Message 16013.  


If I understand Casp correctly, you're only allowed to turn in 5 guesses.. and if using lowest energy, then you'll send in the blue spot, and two models on the left and the two on the right that are just a little higher energy than the blue dot. Since you can't graph the CASP7 results like this (not having the native structure to compare RMSD on), then there's no way to select the higher energy models 1 angstrom to the left of the 5 low energy models?

One could use the following approach. Use the lowest energy structure as a reference and calculate the RMSD to that one. Then send only structures as guess 2 to 5, which have at least an RMSD > x compared to the lowest energy structure.
ID: 16036 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,678,748
RAC: 1,060
Message 16054 - Posted: 12 May 2006, 15:20:49 UTC

I would assume if it was, then the prediction would soon fail or divert to something else showing this.

Not necessarily. If the structure of the similar protein is known and presumably the lowest energy structure, the unknown sequence may well have a deep energy well thereabouts, but there could be a totally different configuration which is a deeper well.

I don't know how likely that is however, just an observation.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 16054 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16055 - Posted: 12 May 2006, 15:21:36 UTC

Global optimization vs heuristics:

I can tell you they aren't the same thing. But I'm not certain what global optimization is either. I'm assuming it's a mathematical concept of trying to solve a complex equation.

Heuristics is basically use of historical statistics. Let's say you want to write a computer program to play chess. If your computer is not powerful enough to analyze ALL of the possible moves that lay ahead, then what you do is cheat. One way to cheat is to only look 5 or 6 moves ahead... you do this because "heuristically" you've found that if your decision still looks like a good one that far down the road, then typically it proves to be a good one through the end of the game... even though you've not looked that far ahead yet.

In evaluating what is a "good move" you have to devise some method of "scoring" the current game board. i.e. you need some means of analyzing proposed move 1 and comparing it with proposed move 2. And so another way to cheat is to look at the field of next possible moves (say there are 20), and throw out the worst scoring choices... and hueristics (i.e. your past experiences) will determine how many of the poorly scoring choices to throw out and how many to pursue further (i.e. continue looking forward at the moves that would follow those).

Say you determine you can throw out 10 of the choices, you then look forward on only 10 possible moves rather than 20. This cuts the computing time downstream from there in half! As you can see, if your scoring mechanism isn't good, you might throw away some moves that prove themselves to be the BEST possible... 3 moves later in the game.

Playing chess is similar in many ways to what Rosetta is doing with atoms and molecules. The key is in the "Rosetta score". If the scoring mechanism is perfect (and it's not), you can dramatically narrow your field of possible "moves" (i.e. rotational possibilities) to pursue further.

Some words to Google if you are interested: Game theory, game tree, depth-first-search, breadth-first-search, backtracking, traveling salesman problem, napsack algorythm.

So, the team has devised several different approaches to solving protein structures. As you've read in Dr. Baker's posts, they are finding creative ways of combining the approaches, and finding heuristically that these combination approaches are yielding better results. As they devise new approaches, they should expect to find some work better for some situations than others. For example, a given approach or scoring method works great for proteins less then 100 amino acids long, but doesn't seem to work well for a 200AA protein. Also, as they find proteins where their approach fails to produce a viable structure... they look for more new approaches :)

Since this is all still a new science, and a blind study where you don't KNOW the right answer, I would expect them to try all of their approaches on each protein to some extent. And they'll have to gauge which is looking to be the most likely to produce the best prediction to bring work out to R@H. You have to look at the protein your are handed and determine first whether it is a "screw" or a "nail" or a "bolt" and then decide whether you should reach for your screwdriver, your hammer or your wrench to work with it.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16055 · Rating: 1 · rate: Rate + / Rate - Report as offensive
Robert Everly

Send message
Joined: 8 Oct 05
Posts: 27
Credit: 665,094
RAC: 0
Message 16249 - Posted: 14 May 2006, 13:38:37 UTC

Today (Friday) we have closed accepting server predictions for the first
target, T0283.


What should be done with T0283 WUs that are still on our machines?

Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines.

Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit?
ID: 16249 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Rollo

Send message
Joined: 2 Jan 06
Posts: 21
Credit: 106,369
RAC: 0
Message 16252 - Posted: 14 May 2006, 14:45:55 UTC - in response to Message 16249.  
Last modified: 14 May 2006, 14:47:33 UTC

Today (Friday) we have closed accepting server predictions for the first
target, T0283.


What should be done with T0283 WUs that are still on our machines?

Also. should the deadlines for the CASP7 WUs be just a bit ealier than the real deadlines.

Back to the T0283. One of them I still got running has a deadline of May 27. But if the CASP deadline was May 12, will those results be of any benefit?


Do Rosetta belong to the group 'server prediction' or 'human expert prediction'? If the latter, than it should be called 'human expert prediction (computer assisted)'.
ID: 16252 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal



©2024 University of Washington
https://www.bakerlab.org