Comments/questions on Rosetta@home journal

Author	Message
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 12610 - Posted: 24 Mar 2006, 8:44:04 UTC - in response to Message 12542. Thanks for more interesting questions. I'm waiting for a couple of things to run, and this is nice temporary entertainment. Thanks Will for the detailed answers (and also for the new 'teaser' about the distances that don't come out right). I also had a look at 'The Good, the Bad and the Ugly'. ;-) Cool stuff - though I only understood half of it. But it seems to give a good overview of what you guys at the Baker Lab are up to. Also thanks to Rhiju for his responses here and in the 'science news' thread (and of course for the new links on the top prediction page). ID: 12610 · Rating: 0 · rate: /

R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0	Message 12774 - Posted: 28 Mar 2006, 23:45:47 UTC Dr. Baker, Any word on possible radio/tv interviews yet? Thanks. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. ID: 12774 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12780 - Posted: 29 Mar 2006, 3:23:18 UTC - in response to Message 12774. Dr. Baker, Any word on possible radio/tv interviews yet? Thanks. Hi Robert, I haven't heard anything yet. David ID: 12780 · Rating: 0 · rate: /

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 12783 - Posted: 29 Mar 2006, 7:09:27 UTC Dr. Baker: While trying to track down a text version of a video news report on a new drug targetting aids, I ran across the following article. I wondered if your team dealing with finding a vaccine for HIV had taken the points raised in the article into account: http://www.mindfully.org/Health/2006/AIDS-Medical-Corruption1mar06.htm On another note.. I'm looking forward to seeing how the results of your latest test will compare with the patterns we've seen so far. (It'll be a little while before we see a ball flattened against the left side, eh? :) ID: 12783 · Rating: 0 · rate: /

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 12959 - Posted: 2 Apr 2006, 21:09:17 UTC Last modified: 2 Apr 2006, 21:13:13 UTC A question about the "adjustable run-time" of WUs: Supposing 1) we don't care at all about extra Internet traffic and 2) we don't encounter problems when running Rosetta WU longer would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines). I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right? I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly). Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 12959 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12967 - Posted: 3 Apr 2006, 1:35:44 UTC - in response to Message 12959. A question about the "adjustable run-time" of WUs: Supposing 1) we don't care at all about extra Internet traffic and 2) we don't encounter problems when running Rosetta WU longer would there be ANY "science" reason to not raise WU-runtime from the default of 2hr to 8,12,24 hours per WU -or 2-4 days when it becomes available again-? (Depending on one's commitment to other BOINC projects and PC uptime ofcourse, so he can meet WU deadlines). I assumed every model run is independant. So, unless the user has a valid "technical" reason (e.g. longer WUs getting hung on occasion on his PCs), there's no "science" reason one shouldn't raise it to the max of 24hr and lessen Internet traffic, right? I'm asking because I've noticed that e.g. Univ.Washington's own "Housing and Food Services" computers are using the default WU runtime of 2hr (at least those I've checked randomly). Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. ID: 12967 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12977 - Posted: 3 Apr 2006, 11:48:58 UTC - in response to Message 12967. Last modified: 3 Apr 2006, 11:49:30 UTC Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. Regards, Bob P. ID: 12977 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12983 - Posted: 3 Apr 2006, 15:46:29 UTC - in response to Message 12977. Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you? Regards, Bob P. ID: 12983 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 12985 - Posted: 3 Apr 2006, 15:55:15 UTC - in response to Message 12983. Yes, that is correct. Each model run is completely independent, so 12 2hr work units are equivalent to 1 24 hr workunit from our perspective. So the project has no official point of view on optimal workunit run length because they are equivalent? And as long as the server can handle the load, it makes no difference between running say 12 2 hour units compared with one 24 hour unit? Thanks. The only difference of course is that you would get the 2-hour results back more quickly than a 24-hour result, if that makes a difference for you? It is a pretty even tradeoff because while we get the results back more quickly, there is also more network traffic with the shorter WU. ID: 12985 · Rating: 0 · rate: /

Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0	Message 12994 - Posted: 3 Apr 2006, 18:08:05 UTC So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading) Keep the default, or set the settings to an higher number of hours? Member of Dutch Power Cows ID: 12994 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 12995 - Posted: 3 Apr 2006, 18:22:11 UTC - in response to Message 12994. Last modified: 3 Apr 2006, 18:23:45 UTC So what's the best for normal users? (24h/day, 7d/week running, downloading, uploading) Keep the default, or set the settings to an higher number of hours? I think it used to be 8 hours before the 1% Bug became so noticeable an issue. Now that the 1% Bug seems to have been solved based on early results with the new client, is 8 hours the best to use? Regards, Bob P. ID: 12995 · Rating: 0 · rate: /

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 13136 - Posted: 6 Apr 2006, 21:37:26 UTC Last modified: 6 Apr 2006, 21:45:45 UTC Question 1: Quoting from the journal (src) Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question. With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place. Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Question 2: I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 13136 · Rating: 0 · rate: /

bruce boytler Send message Joined: 17 Sep 05 Posts: 68 Credit: 3,565,442 RAC: 0	Message 13139 - Posted: 6 Apr 2006, 22:20:47 UTC - in response to Message 13136. [With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.[/quote] Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? The only ones I have heard about were the ones belonging to someone called Food and Housing Services. In the end the Baker Lab has made it clear since September 2005 that it is up to us the "crunchers" to come up with the cpu to solve this pivotal science problem. Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Blue Gene at Lawrence Livermore is used to predict Nuclear Weapons Bomb yeilds. This is good because if this was not done on Blue Gene they would be exploding them for real. Like in the 1950's and 1960's out in the Nevada deseret. Hope this helps a little.........Bye All! ID: 13139 · Rating: 0 · rate: /

TritoneResolver Send message Joined: 28 Nov 05 Posts: 4 Credit: 57,690 RAC: 0	Message 13144 - Posted: 7 Apr 2006, 0:22:37 UTC The website of FoxNews has picked up the story that was published on livescience.com today. See here: Link. Hopefully, this will bring more attention and members to Rosetta! ID: 13144 · Rating: 0 · rate: /

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 13145 - Posted: 7 Apr 2006, 1:26:08 UTC We're off topic, but here's a link to some of what Blue Gene has been up to. I haven't seen the "gene" aspect of the supercomputer brought up much since it's inception and intial press. I'm guessing they went where the money was, rather than for curing most all of the diseases of mankind. Purely a "business decision," you understand. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 13145 · Rating: 0 · rate: /

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 13151 - Posted: 7 Apr 2006, 4:10:14 UTC - in response to Message 13136. Question 1: Quoting from the journal (src) Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question. With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place. Considering the implications of solving the problem of protein structure prediction, if the bottleneck is an issue of CPU power, why aren't more public resources being offered? Also, what are the world's top 2 supercomputers BlueGene and BlueGene/L, supposedly built to tackle the protein folding problem (see Gene Machine article in Wired 2001) used for nowadays? Question 2: I think the question about the "official" suggestion for WU runtime is still unanswered. I assume that since the default in the recent past was 8hr, that a PC which doesn't encounter problems "should" use 8hr. This means a round-trip-time of <1 day even if a PC is running during office hours only. Question 1: Public computing resources have been allocated to us to try to solve the structure prediction problem, including 5 million hours on blue gene through the department of energy's INCITE awards in high performance computing. This was the largest award they made this year. http://72.14.203.104/search?q=cache:dkcheEi0bi8J:nccs.gov/news/pr2006/FY2006_INCITE_Award_Factsheet01312006final.pdf. The NSF funded SDSC and NCSA supercomputing centers have also been extremely generous in providing us computing resources when we were really stuck (in the pre rosetta@home days a year ago). however, to put this into perspective, let us assume that there are 20,000 computers crunching full time for rosetta@home. then 5 million hours, one of the largest allocations of public computing power ever, corresponds to only 10 days on rosetta@home. also, since blue gene was built for high connectivity, the individual processors are 5x slower than most of your PCs. so we are using blue gene to test approaches that take advantage of the high node interconnectivity, and during CASP, which is coming up soon, we will use blue gene for larger proteins which require lots of memory--we are already pushing the limits of public distributed computing with the moderate size proteins running now . Question 2: from the science point of view, 2 hour and 8 hour work units are equivalent. however, 8 hour work units cut down on network traffic, and so are more optimal PROVIDED THAT the computer has a relatively low error rate. we are now cautiously increasing the default work unit length from 2 to 4 hours and we will see how this goes. but users who rarely encounter errors should definitely choose the 8 hour work unit option all other things being equal. ID: 13151 · Rating: 0 · rate: /

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 13164 - Posted: 7 Apr 2006, 11:18:17 UTC - in response to Message 13151. ...we are already pushing the limits of public distributed computing with the moderate size proteins running now . With new public machines having at least a gig of memory, perhaps you could establish a "Rosetta II" or something like that as well, whose minimum memory requirements are a gig. I don't know if this doubling of the memory requirement helps you, but more and more memory is being packed into the newer machines... Regards, Bob P. ID: 13164 · Rating: 0 · rate: /

Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0	Message 13168 - Posted: 7 Apr 2006, 12:48:37 UTC - in response to Message 13164. Last modified: 7 Apr 2006, 12:49:31 UTC With new public machines having at least a gig of memory, perhaps you could establish a "Rosetta II" or something like that as well, whose minimum memory requirements are a gig. I don't know if this doubling of the memory requirement helps you, but more and more memory is being packed into the newer machines... Seasonal Attribution Project has a requirement of 1G memory, and according to BOINCstats there is about 580 users, and 700 hosts. I think a Rosetta II could be one solution, but another could be an opt-in setting in our preferences for crunching large WU (as discussed else where in the forums) ID: 13168 · Rating: 0 · rate: /

dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0	Message 13184 - Posted: 7 Apr 2006, 19:05:38 UTC - in response to Message 13151. [quote]Question 1: Quoting from the journal (src) [quote] Question 2: from the science point of view, 2 hour and 8 hour work units are equivalent. however, 8 hour work units cut down on network traffic, and so are more optimal PROVIDED THAT the computer has a relatively low error rate. we are now cautiously increasing the default work unit length from 2 to 4 hours and we will see how this goes. but users who rarely encounter errors should definitely choose the 8 hour work unit option all other things being equal. If I knew more about the checkpointing alg., I wouldn't have to ask this question. I have a portable that I move a couple of times per work day and I always do a nice suspend and wait until it clears from memory before I turn off the box. Even so, every turn-off probably loses the work that was done since the last checkpoint. Would an 8 hour WU cause more wasted cpu time than a 2 hour WU in this usage model? dag dag --Finding aliens is cool, but understanding the structure of proteins is useful. ID: 13184 · Rating: 0 · rate: /

anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0	Message 13185 - Posted: 7 Apr 2006, 19:42:46 UTC - in response to Message 13184. Would an 8 hour WU cause more wasted cpu time than a 2 hour WU in this usage model? dag No the models has the same length with 8 H as in 2 H . The computer just does alot more of them. Anders n ID: 13185 · Rating: 0 · rate: /