Message boards : Number crunching : What about Docking@home and Proteins@home?
Author | Message |
---|---|
Gerry Rough Send message Joined: 2 Jan 06 Posts: 111 Credit: 1,389,340 RAC: 0 |
|
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture? AFAIK Docking is a kind of follow up of Predictor. At least some of the staff are the same: Team Docking vs. Team Predictor. I'm no protein scientist, so it's up to those to decide about the sameness of the research, or better the (important) differences. |
Michael G.R. Send message Joined: 11 Nov 05 Posts: 264 Credit: 11,247,510 RAC: 0 |
Docking seems to be invitation-only right now (unless I'm missing something). Do you know how to get one of these invitations? |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Saenger, I believe he was talking about Proteins at home. A French project from École Polytechnique. It too takes an invite code. See sig below. They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work, but I'm not sure of that. This is very alpha and was supposed to be kept quiet, but now that they're exporting stats, I suppose they won't mind. |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Docking seems to be invitation-only right now (unless I'm missing something). Write an email to the devs (email available somewhere on their pages), und if you own a mac or a Linux machine, you might get one. Saenger, I believe he was talking about Proteins at home. A French project from Ecole Polytechnique. It too takes an invite code. See sig below. I know, but I don't know about Proteins besides that. That's why I didn't say anything about them. But I do crunch for Docking, and I know that M.Taufer and A.Kerstens were members of the Predictor Team. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) At that time there were several different projects, and while they all looked at protiens they were either looking for different things or were trying different techniques. The impression I took away from the posting was that all the different approaches had some value and that I personally was not equipped to make a call between them on the basis of the science. So my guess is that in the big picture both those new projects will be helping to complete the picture. To some extent too I'd guess they are in competition with each other in the race to get the best prediction technique etc. River~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
... Proteins at home. A French project from École Polytechnique. ... When President de Gaulle was around they'd never have got away with that name - it would have had to be ProteinsChezNous or suchlike ;-) They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work, If XP works then it is possible that 2k will too. 2k is much more like XP than the 9x versions of windows
Many new projects have enjoyed the fantasy that they can export stats and still keep hush. It is a fallacy of course - once credits are released they find their way into people's sigs and then others want to find out more. But I'd suggest that it is not safe to assume that this project is free from that fanatsy. R~~ |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
|
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) I see they have a much nicer forum layout as well :-) Rosetta should take note ;-) If anyone can post there could you ask how it compares to the THINK virtual screening/docking as developed by www.treweren.com I'm sure there are a few people that would like to know that of which some HIV protease inhibition can be found here ? Since it's invite only I cannot post there (afaik) Team mauisun.org |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Since it's invite only I cannot post there (afaik) But they can post here ;) I'll post a link to this thread in the forum. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) A useful link, thanks zombie. But actually I was thinking of this page which has a comparison of lots of projects that were around when the page was written -- BOINC and other dc projects. In the Life sciences part of the page is an explanation from WCG aboyut the difference between what they do with Rosetta and what this project does with Rosetts, followed by a piece from our David about how Rosetta compares with some of the other protein projects around then. After you have read the life sciences part, scroll down to the bottom where he gives his own personal choice amongst projects. Parts of the page are more up to date than others - it is a one-man effort by Dimitri (apart from the quotes he gets from project scientists) so can't really fault him on the fact that some points are out of date. And by the way the physics part of the page is just as useful, as are the comments on a few non-BOINC projects. River~~ |
Dr. Armen Send message Joined: 25 Oct 06 Posts: 4 Credit: 0 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) Thanks for the link Saenger, Thanks for the question Fluffy Chicken ! My name is Roger S. Armen and I am the primary scientist directing the development and application of molecular docking techniques for the Docking@Home project. I just answered Fluffy Chicken's question about THINK and virtual screening on the Docking@Home thread, and Saenger asked me to provide a short answer on this thread. The Find-a-Drug project (which to my knowledge is now permanently closed as of Dec 2005) used THINK to do virtual screening/docking for various drug targets. At Docking@Home, we also hope to perform virtual screens in the future using our CHARMM-based docking. This is a similarity between these two projects - in that we are both doing protein-ligand docking. However, a significant difference between Docking@home and other protein-ligand docking projects is that during the first phase of the project (first few years) we are going to focus on improving our docking methodology rather than focusing on specific applications. Our early focus is to continue to improve, develop and validate our CHARMM-based protein-ligand docking methods. Another signifigant difference is that Docking@Home uses quite different protein-ligand docking methodology than THINK. Also, from the computer science point of view, there are several interesting computer science objectives of the Docking@home project that involve dynamic allocation and scheduling of tasks that have differing needs for resource requirements and client reliability. |
Dr. Armen Send message Joined: 25 Oct 06 Posts: 4 Credit: 0 RAC: 0 |
Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture? Thanks for the link Saenger, My name is Roger S. Armen and I am the primary scientist directing the development and application of molecular docking techniques for the Docking@Home project. Since I am not part of the Rosetta@Home project, I should not speak for them. However, I can speak for Docking@Home, and one signifigant difference between these projects is that Docking@Home does not perform any protein folding calculations or any protein structure predictions. Docking@Home uses CHARMM-based molecular docking to predict the conformation of small drug-like molecules (also known as ligands) when they bind to a protein of known structure. We try to predict the structure of a protein-ligand complex. Another difference is that Rosetta@Home also performs some protein-protein docking applications, and Docking@Home does not intend to do protein-protein docking (at least not for the early stages of the project). Also, with regards to Docking and Predictor: Docking is not a "follow up" of predictor, they are compleatly different projects with very different goals, and different funding sources. The difference between Docking@Home and Predictor is that Predictor performs protein folding calcuations and protein structure predictions. As I outlined above Docking@Home does not do this. It is true that some of the personel does overlap. Dr. Michela Taufer helped establish the predictor project when she was at The Scripps Research Institute (TSRI), and now she has established her own project in her laboratory at University of Texas El Paso (UTEP) in collaboration with the Charlie Brooks Laboratory at TSRI. |
[AF>Slappyto] popolito Send message Joined: 8 Mar 06 Posts: 13 Credit: 1,033,737 RAC: 773 |
Proteins@home allows to find the differents sequences of amino acids for a folding. The project goal is to calculate the energy fonctions for the differents sequences. They call it the : the inverse problem of the folding. http://biology.polytechnique.fr/proteinsathome/documentation.php I'm sorry for my bad English :) |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
If Docking@Home isn't using redundancy, could someone tell Dr Armen to fix up the BOINC credit problem before Docking@Home goes through the Rosetta@Home experience. I'm sure that the Rosetta@Home team would be more than willing to explain why it is very important to fix it properly and early in the projects life and explain what the credit system ,they came up with, is here. If it is using redundancy, less power to them :) sorry about the bad wordplay :p |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
If it is using redundancy, less power to them :) sorry about the bad wordplay :p Every project is using redundancy, or it's plain random bills*** and not science. Some do it by sending the same work to different participants (like Docking, Einstein, Malaria...), some do it somehow on the serverside (like Rosetta, CPDN). How it's done is secondary, but if it's not done it's not worth any power ;) |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
If it is using redundancy, less power to them :) sorry about the bad wordplay :p There certainly has to be some protection against rogue results (results that are wrong whether by dishonesty, or over enthusiastic overclocking, or a faulty computer, etc) Having agreed thus far, there is an important difference between redundancy and other forms of error control like building an ensemble (which we are doing here). With redundancy (as the word is used by BOINC) exactly the same task is crunched by two different users. With an ensemble, everyone does different tasks (for example with different starting conditions (CPDN) or a different set of random numbers (Rosetta)) and the end result is derived from the ensemble of all the different results in such a way that a few rogue results are unlikely to make any difference (or the end result can be independently checked. The crucial difference is that with redundancy work is lost when two accurate and honest crunchers handle the same data; but the end result is the same whether there are a few rogues or not. (Rogue may mean cheating or may mean an unwitting problem). With an ensemble the presence of a small number of rogue results would very slightly lower the overall accuracy, but to an insignificant extent but every accurate & honest client makes a unique contribution to the end result. Two other approaches are worth noting. Leiden is using both redundancy and an ensemble - each wu that goes into the ensemble is double crunched. This is undoubtedly the most accurate way of generating the end result given infinite resources, but whether they'd do better to have an ensemble twice as wide and no redundancy is another question. Some maths projects have the ability to check a result more easily than calculating it. For example it is easier to check that a square root is correct than to calculate it. So a project to calculate square roots ould run every WU just once, and the validator would just square the delivered answer to see if it was right. (This is an imaginary project - but it illustrates the point). On Rosetta, the lowest energy result is taken from the ensemble - and after all that crunching that is the one that matters. It is then easy for the team to check that that result is correct; if not they'd disqualify it and go to the next lowest. If a million decoys were run, this meqans that typically each result is only run 1.000001 times, or 1.000002 times on the rare event that some error is found in the first. And less still if there is a "short-cut" test like for the square roots. Finally, not possible within BOINC, but popular in some other grid projects, is the idea of "random redundancy". Only some WU are double crunched, but the users are not told which ones. This means that deliberate cheats have to stay honest. When a discrepancy is found, several more wu from both users are checked. If one user is found to be generating more than a very very small number of errors, all their work (and all their credit) is discarded and those WU re-worked. In my opinion the biggest weakness in BOINC was the decision to force the same degree of redundancy on every WU of a type. This is a reflection of the fact that on SETI at the time BOINC was first designed they had more computing power than they needed. Subsequent projects (including SETI when they get access to more input data) suffer from the fact that there is no degree of redundancy possible between 1 and 2. (btw - I like BOINC overall or I would not be here. I also don't think it is perfect!) For example, Zetagrid double crunched about 10% of all work, and less than 0.01% was found to be problematic, those runs then maybe generating 10x the work. This meant that each piece of work was run only 1.101 times, on average; compared to Einstein where the figure is over 2 (2 initial runs and more when needed) or LHC where is is over 5. THis mattered on Zetagrid. We missed out on being first to a trillion "zeroes" by less than 10% - we were up to 910 billion when a mainframe project got the trillion. Had the project leader gone for redundancy a la BOINC we'd have been only just over half way, a huge difference. Sadly, the Zetagrid project was not suitable for an ensemble approach or we'd have got there in fron of the mainframe. So yes, error checking of some sort is essential; but it is meaningful to talk of methods of error checking that do, or don't, involve redundancy. And it is certainly meaningful to talk of the degree of redundancy as a measure of what proportion of the work is devoted to error control. River~~ |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
You must not mix redundancy and validation. You have to validate the results, otherwise their are totally worthless, and so you have to have redundancy on some level or the other. If you have projects like CPDN or Rosetta, where a random seed starts a simulation, no real validation on the WU-level is needed. The whole process is extremely redundant as you crunch the same simulation thousands of thousands of times with just wee alterations. In projects with a vast amount of data to be searched for something meaningful like the needle in the haystack (Einstein: gravitational waves, Seti: radio signals) that's simply impossible, here the better solution has to be validation on a per WU level. There are probably projects that belong in both categories somehow. I don't know, but perhaps Leiden is one of those. The stupid accusation of "wasted CPU-time" for a needed validation is just that: stupid! I can be discussed how the validation can be performed without too much redundanxcy, but it's better to have more redundancy then to have too little. |
Dr. Armen Send message Joined: 25 Oct 06 Posts: 4 Credit: 0 RAC: 0 |
If Docking@Home isn't using redundancy, could someone tell Dr Armen to fix up the BOINC credit problem before Docking@Home goes through the Rosetta@Home experience. Thanks for your suggestion hugo. The members of the Docking@Home team are aware that this is a very important problem. I talked to our project administrator Andre, and he says that we are using redundancy. We require a quorum of 3 valid results before we assign credit. Because of the credit disparancies in boinc, we are planning to do something similar to Rosetta in which we assign credit for valid results or based on some other appropriate measure (independent of the boinc credit system). |
thom217 Send message Joined: 29 Oct 05 Posts: 12 Credit: 182 RAC: 0 |
I remember there was a gentleman who was in touch with Keith Davis, the head of the Find-a-Drug project, at the time of the project closure. He is one of the people responsible for running Chmoogle now called eMolecules search engine. http://www.emolecules.com/ http://usefulchem.blogspot.com/2005/11/chmoogle.html Jean-Claude Bradley http://www.blogger.com/profile/6833158 He might be able to contribute to the Docking@Home database. |
Message boards :
Number crunching :
What about Docking@home and Proteins@home?
©2024 University of Washington
https://www.bakerlab.org