Comments/questions on Rosetta@home journal

Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 12406 - Posted: 21 Mar 2006, 8:15:59 UTC - in response to Message 12343.  

Hi Hoelderlin. I am indeed thinking about some of the things you mentioned. I agree that it's important to look at distributions of scores. This can help to see if a few of sub-structures (residues, atoms, etc) have really bad scores and others are pretty good, or if sub-scores are pretty even overall. One could imagine that if there are a few really bad scores, the structure has some kind of serious flaw. An example of such a local problem which never happens in real proteins is a crack or hole. We haven't had very good luck picking out holes, and this is one case where breaking scores down by residue or even by atom is helpful. Just last friday I was looking into a hole-detector baded on scores for individual atoms. Take a look here if you are interested.

http://www.gs.washington.edu/~wsheffle/boinc/

Hi Will, I had a look at your web page. Intriguing ! Two things I wanted to mention: Could it be that despite having a lower total energy the prediction on the right sits in a shallower local minimum (less energy needed to change the shape), than the more tightly packed structure on the left ? Also, I am not sure how the interaction with the surrounding medium (water) is treated in the energy calculation (I seem to remember something about implicit and explicit solvent models). Would the fact that the hole seems just large enough for individual water molecules (about the size of your 2.4 A probe) to fit through have any effect on the energy calculation ?
ID: 12406 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 12538 - Posted: 23 Mar 2006, 0:45:57 UTC - in response to Message 12196.  

Hi Hoelderlin! You also had a good question about the striking gaps in the 1elw plot... we've been running Rosetta with "score filters". If a client gets about halfway through a run and doesn't make it below a certain energy, we stop the run -- we'd rather you use your computer cycles to start another simluation instead of spending time on a simulation that will likely not get a very low energy! This happens again one more time during the simulation; hence the two gaps in the plot.

I had a good look at the new 1elw plot. Great ! The obvious question of course is: what causes the horizontal gaps ? Is that because of the peculiar shape of the protein ?


ID: 12538 · Rating: 0 · rate: Rate + / Rate - Report as offensive
will sheffler

Send message
Joined: 20 Mar 06
Posts: 3
Credit: 0
RAC: 0
Message 12542 - Posted: 23 Mar 2006, 2:00:23 UTC - in response to Message 12406.  

Hi Will, I had a look at your web page. Intriguing ! Two things I wanted to mention: Could it be that despite having a lower total energy the prediction on the right sits in a shallower local minimum (less energy needed to change the shape), than the more tightly packed structure on the left ? Also, I am not sure how the interaction with the surrounding medium (water) is treated in the energy calculation (I seem to remember something about implicit and explicit solvent models). Would the fact that the hole seems just large enough for individual water molecules (about the size of your 2.4 A probe) to fit through have any effect on the energy calculation ?


Thanks for more interesting questions. I'm waiting for a couple of things to run, and this is nice temporary entertainment.

I'm not sure about the shape of the minima, but it's interesting to think about. One thing that's really tough about characterizing our fullatom energy landscape (including things like depth and bredth of minima) is its roughness. Since all the atoms are explicitly represented and van-der-waals energy goes up really sharply as atoms get closer together, a tiny change in almost any "direction" (in the sense of the energy landscape) can make a big difference in the energy. There are plenty of samples physically similar to the right hand (bad) one and they vary in energy quite a bit. This could indicate a broad-ish local energy minimum. Unfortunately, there aren't too many better samples with the right topology like the one on the left so we can't compare based on a clustering argument. It might be interesting to examine a "time" series of energy values as the structures are folded and refined -- see if one falls off more sharply than another -- but they might not be comparable. Energy points aren't at all evenly distributed over the folding process, and it would be hard to define exactly what the "time interval" between points in the series would be.

We use implicit solvent in all our methods because simulating a whole bunch of water molecules is really really expensive. This can make some aspects of scoring tricky. The biggest impact is on scoring hydrogen bonding and burial of charged atoms/groups, the hydrophomic effect, etc, but it also seems to have an impact on our packing -- holes, van-der-waals forces, etc. Fro holes, our basic method is to look for regions which are buried AND have exposed surface to various sized spheres. I've been using two different criterion for burial: (1) no exposed surface to a water sized (~2.8A diameter) sphere and (2) lots of other atoms within 10A. this makes sure you aren't too close to the surface.. you might imagine a case where a water can't get at an atom but it's really close to the surface so the protein might "breathe" a little and expose it. Another point is that most what's in the core of a protein is hydrophomic, so even if a water could squeze in, it probably wouldn't been energetically/entropically favorable.

Here's another, more subtle thing that lack of explicit solvant might cause. Whave a funny effect where atoms in our predictions tend to pack a little to close together. Take a look at the top-left plot in the pdf below. It shows a histogram of distances between hydrophobic groups on the ends of the side chains. black is from a big set of real proteins. red, blue and green are successively more optimized predictions with our fullatom energy function. Note that the more optimized the decoys are the more they tend to pile up the methyls a little closer than in the real proteins. Back to slovation, we think this might have to do with a "finite lattice" type effect. Everything on the inside of the protein is pulling on everything else, so the atoms want to contract. In a real system, this wouldn't matter because there are atoms in all the water around the protein too, so they "pull" out, balancing the intra-protein van-der-waals forces.

http://www.gs.washington.edu/~wsheffle/boinc/atype_dist_CH3-CH3_Hapo-Hapo_OH-OOC_OCbb-NH2O.pdf

ID: 12542 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sean Kiely

Send message
Joined: 31 Jan 06
Posts: 64
Credit: 43,992
RAC: 0
Message 12577 - Posted: 23 Mar 2006, 17:19:14 UTC

Hi Will:

Thank you for the interesting details about the issue of implicit vs. explicit solvation.

Would there be any possible benefit to modeling explicit solvation (very sparsely) during processing? Maybe once or twice partway through especially promising structures? I wonder if it might keep us from wandering onto paths where subtle solvation issues keep us away from achieving best energy minima?

Sort of a "solvation sanity check"?

I recognise, of course, that modeling complexity issues or sheer processing costs may make this unworkable!

Sean
ID: 12577 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 12610 - Posted: 24 Mar 2006, 8:44:04 UTC - in response to Message 12542.  

Thanks for more interesting questions. I'm waiting for a couple of things to run, and this is nice temporary entertainment.

Thanks Will for the detailed answers (and also for the new 'teaser' about the distances that don't come out right). I also had a look at 'The Good, the Bad and the Ugly'. ;-) Cool stuff - though I only understood half of it. But it seems to give a good overview of what you guys at the Baker Lab are up to.

Also thanks to Rhiju for his responses here and in the 'science news' thread (and of course for the new links on the top prediction page).
ID: 12610 · Rating: 0 · rate: Rate + / Rate - Report as offensive
R/B

Send message
Joined: 8 Dec 05
Posts: 195
Credit: 28,095
RAC: 0
Message 12774 - Posted: 28 Mar 2006, 23:45:47 UTC

Dr. Baker,

Any word on possible radio/tv interviews yet? Thanks.
Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers.


ID: 12774 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12780 - Posted: 29 Mar 2006, 3:23:18 UTC - in response to Message 12774.  

Dr. Baker,

Any word on possible radio/tv interviews yet? Thanks.



Hi Robert, I haven't heard anything yet. David
ID: 12780 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 12783 - Posted: 29 Mar 2006, 7:09:27 UTC

Dr. Baker:

While trying to track down a text version of a video news report on a new drug targetting aids, I ran across the following article. I wondered if your team dealing with finding a vaccine for HIV had taken the points raised in the article into account:

http://www.mindfully.org/Health/2006/AIDS-Medical-Corruption1mar06.htm


On another note.. I'm looking forward to seeing how the results of your latest test will compare with the patterns we've seen so far. (It'll be a little while before we see a ball flattened against the left side, eh? :)
ID: 12783 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 12959 - Posted: 2 Apr 2006, 21:09:17 UTC
Last modified: 2 Apr 2006, 21:13:13 UTC

A question about the "adjustable run-time" of WUs:

Supposing
1) we don't care at all about extra Internet traffic and
2) we don't encounter problems when running Rosetta WU longer

would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines).

I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right?

I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly).


Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 12959 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12967 - Posted: 3 Apr 2006, 1:35:44 UTC - in response to Message 12959.  

A question about the "adjustable run-time" of WUs:

Supposing
1) we don't care at all about extra Internet traffic and
2) we don't encounter problems when running Rosetta WU longer

would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines).

I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right?

I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly).



Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective.

ID: 12967 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 12977 - Posted: 3 Apr 2006, 11:48:58 UTC - in response to Message 12967.  
Last modified: 3 Apr 2006, 11:49:30 UTC

Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective.

So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit?

Thanks.
Regards,
Bob P.
ID: 12977 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 12983 - Posted: 3 Apr 2006, 15:46:29 UTC - in response to Message 12977.  

Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective.

So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit?

Thanks.

The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you?

Regards,
Bob P.
ID: 12983 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 12985 - Posted: 3 Apr 2006, 15:55:15 UTC - in response to Message 12983.  

Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective.

So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit?

Thanks.

The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you?



It is a pretty even tradeoff because while we get the results back more quickly, there is also more network traffic with the shorter WU.
ID: 12985 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Cureseekers~Kristof

Send message
Joined: 5 Nov 05
Posts: 80
Credit: 689,603
RAC: 0
Message 12994 - Posted: 3 Apr 2006, 18:08:05 UTC

So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading)
Keep the default, or set the settings to an higher number of hours?
Member of Dutch Power Cows
ID: 12994 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 12995 - Posted: 3 Apr 2006, 18:22:11 UTC - in response to Message 12994.  
Last modified: 3 Apr 2006, 18:23:45 UTC

So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading)
Keep the default, or set the settings to an higher number of hours?

I think it used to be 8 hours before the 1% Bug became so noticeable an issue. Now that the 1% Bug seems to have been solved based on early results with the new client, is 8 hours the best to use?

Regards,
Bob P.
ID: 12995 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 13136 - Posted: 6 Apr 2006, 21:37:26 UTC
Last modified: 6 Apr 2006, 21:45:45 UTC

Question 1:
Quoting from the journal (src)
Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question.

With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.


Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered?

Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays?

Question 2:
I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 13136 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile bruce boytler
Avatar

Send message
Joined: 17 Sep 05
Posts: 68
Credit: 3,565,442
RAC: 0
Message 13139 - Posted: 6 Apr 2006, 22:20:47 UTC - in response to Message 13136.  

[With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.[/quote]

Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered?

The only ones I have heard about were the ones belonging to someone called Food and Housing Services. In the end the Baker Lab has made it clear since September 2005 that it is up to us the "crunchers" to come up with the cpu to solve this pivotal science problem.

Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays?

Blue Gene at Lawrence Livermore is used to predict Nuclear Weapons Bomb yeilds. This is good because if this was not done on Blue Gene they would be exploding them for real. Like in the 1950's and 1960's out in the Nevada deseret.


Hope this helps a little.........Bye All!

ID: 13139 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile TritoneResolver

Send message
Joined: 28 Nov 05
Posts: 4
Credit: 57,690
RAC: 0
Message 13144 - Posted: 7 Apr 2006, 0:22:37 UTC

The website of FoxNews has picked up the story that was published on livescience.com today. See here: Link. Hopefully, this will bring more attention and members to Rosetta!
ID: 13144 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 13145 - Posted: 7 Apr 2006, 1:26:08 UTC

We're off topic, but here's a link to some of what Blue Gene has been up to. I haven't seen the "gene" aspect of the supercomputer brought up much since it's inception and intial press. I'm guessing they went where the money was, rather than for curing most all of the diseases of mankind. Purely a "business decision," you understand.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 13145 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13151 - Posted: 7 Apr 2006, 4:10:14 UTC - in response to Message 13136.  

Question 1:
Quoting from the journal (src)
Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question.

With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.


Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered?

Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays?

Question 2:
I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only.



Question 1: Public computing resources have been allocated to us to try to solve the structure prediction problem, including 5 million hours on blue gene through the department of energy's INCITE
awards in high performance computing. This was the largest award they made this year.

http://72.14.203.104/search?q=cache:dkcheEi0bi8J:nccs.gov/news/pr2006/FY2006_INCITE_Award_Factsheet01312006final.pdf.

The NSF funded SDSC and NCSA supercomputing centers have also been extremely generous in providing us computing resources when we were really stuck (in the pre rosetta@home days a year ago).

however, to put this into perspective, let us assume that there are 20,000 computers crunching full time for rosetta@home. then 5 million hours, one of the largest allocations of public computing power ever, corresponds to only 10 days on rosetta@home. also, since blue gene was built for high connectivity, the individual processors are 5x slower than most of your PCs. so we are using blue gene to test approaches that take advantage of the high node interconnectivity, and during CASP, which is coming up soon, we will use blue gene for larger proteins which require lots of memory--we are already pushing the limits of public distributed computing with the moderate size proteins running now .

Question 2: from the science point of view, 2 hour and 8 hour work units are equivalent. however, 8 hour work units cut down on network traffic, and so are more optimal PROVIDED THAT the computer has a relatively low error rate. we are now cautiously increasing the default work unit length from 2 to 4 hours and we will see how this goes. but users who rarely encounter errors should definitely choose the 8 hour work unit option all other things being equal.

ID: 13151 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal



©2024 University of Washington
https://www.bakerlab.org