What about Docking@home and Proteins@home?

Message boards : Number crunching : What about Docking@home and Proteins@home?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Gerry Rough
Avatar

Send message
Joined: 2 Jan 06
Posts: 111
Credit: 1,389,340
RAC: 0
Message 29832 - Posted: 22 Oct 2006, 20:42:38 UTC

Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture?

(Click for detailed stats)
ID: 29832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 29835 - Posted: 22 Oct 2006, 21:42:38 UTC - in response to Message 29832.  

Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture?

AFAIK Docking is a kind of follow up of Predictor. At least some of the staff are the same: Team Docking vs. Team Predictor.

I'm no protein scientist, so it's up to those to decide about the sameness of the research, or better the (important) differences.
ID: 29835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 29842 - Posted: 23 Oct 2006, 2:46:05 UTC

Docking seems to be invitation-only right now (unless I'm missing something).

Do you know how to get one of these invitations?
ID: 29842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 29844 - Posted: 23 Oct 2006, 3:33:51 UTC
Last modified: 23 Oct 2006, 3:36:21 UTC

Saenger, I believe he was talking about Proteins at home. A French project from École Polytechnique. It too takes an invite code. See sig below.

They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work, but I'm not sure of that. This is very alpha and was supposed to be kept quiet, but now that they're exporting stats, I suppose they won't mind.




ID: 29844 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 29855 - Posted: 23 Oct 2006, 7:15:50 UTC - in response to Message 29842.  

Docking seems to be invitation-only right now (unless I'm missing something).

Do you know how to get one of these invitations?

Write an email to the devs (email available somewhere on their pages), und if you own a mac or a Linux machine, you might get one.

Saenger, I believe he was talking about Proteins at home. A French project from Ecole Polytechnique. It too takes an invite code. See sig below.


I know, but I don't know about Proteins besides that. That's why I didn't say anything about them.
But I do crunch for Docking, and I know that M.Taufer and A.Kerstens were members of the Predictor Team.
ID: 29855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29867 - Posted: 23 Oct 2006, 11:09:28 UTC

Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?)

At that time there were several different projects, and while they all looked at protiens they were either looking for different things or were trying different techniques. The impression I took away from the posting was that all the different approaches had some value and that I personally was not equipped to make a call between them on the basis of the science.

So my guess is that in the big picture both those new projects will be helping to complete the picture. To some extent too I'd guess they are in competition with each other in the race to get the best prediction technique etc.

River~~
ID: 29867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29868 - Posted: 23 Oct 2006, 11:18:36 UTC - in response to Message 29844.  

... Proteins at home. A French project from École Polytechnique. ...

When President de Gaulle was around they'd never have got away with that name - it would have had to be ProteinsChezNous or suchlike ;-)
They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work,

If XP works then it is possible that 2k will too. 2k is much more like XP than the 9x versions of windows


This is very alpha and was supposed to be kept quiet, but now that they're exporting stats, I suppose they won't mind.

Many new projects have enjoyed the fantasy that they can export stats and still keep hush. It is a fallacy of course - once credits are released they find their way into people's sigs and then others want to find out more. But I'd suggest that it is not safe to assume that this project is free from that fanatsy.

R~~
ID: 29868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 60
Message 29897 - Posted: 23 Oct 2006, 21:23:19 UTC - in response to Message 29867.  

Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?)


See message 248 here.
Reno, NV
Team: SETI.USA
ID: 29897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 29924 - Posted: 24 Oct 2006, 8:43:48 UTC - in response to Message 29897.  

Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?)


See message 248 here.


I see they have a much nicer forum layout as well :-) Rosetta should take note ;-)


If anyone can post there could you ask how it compares to the THINK virtual screening/docking as developed by www.treweren.com I'm sure there are a few people that would like to know that of which some HIV protease inhibition can be found here ?

Since it's invite only I cannot post there (afaik)
Team mauisun.org
ID: 29924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 29935 - Posted: 24 Oct 2006, 13:30:43 UTC - in response to Message 29924.  

Since it's invite only I cannot post there (afaik)

But they can post here ;)
I'll post a link to this thread in the forum.
ID: 29935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29968 - Posted: 24 Oct 2006, 19:25:04 UTC - in response to Message 29897.  
Last modified: 24 Oct 2006, 19:34:41 UTC

Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?)


See message 248 here.


A useful link, thanks zombie.

But actually I was thinking of this page which has a comparison of lots of projects that were around when the page was written -- BOINC and other dc projects.

In the Life sciences part of the page is an explanation from WCG aboyut the difference between what they do with Rosetta and what this project does with Rosetts, followed by a piece from our David about how Rosetta compares with some of the other protein projects around then.

After you have read the life sciences part, scroll down to the bottom where he gives his own personal choice amongst projects.

Parts of the page are more up to date than others - it is a one-man effort by Dimitri (apart from the quotes he gets from project scientists) so can't really fault him on the fact that some points are out of date.

And by the way the physics part of the page is just as useful, as are the comments on a few non-BOINC projects.

River~~
ID: 29968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Armen
Avatar

Send message
Joined: 25 Oct 06
Posts: 4
Credit: 0
RAC: 0
Message 30017 - Posted: 25 Oct 2006, 22:53:32 UTC - in response to Message 29924.  

Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?)


See message 248 here.


I see they have a much nicer forum layout as well :-) Rosetta should take note ;-)


If anyone can post there could you ask how it compares to the THINK virtual screening/docking as developed by www.treweren.com I'm sure there are a few people that would like to know that of which some HIV protease inhibition can be found here ?

Since it's invite only I cannot post there (afaik)


Thanks for the link Saenger, Thanks for the question Fluffy Chicken !

My name is Roger S. Armen and I am the primary scientist directing the development and application of molecular docking techniques for the Docking@Home project. I just answered Fluffy Chicken's question about THINK and virtual screening on the Docking@Home thread, and Saenger asked me to provide a short answer on this thread. The Find-a-Drug project (which to my knowledge is now permanently closed as of Dec 2005) used THINK to do virtual screening/docking for various drug targets. At Docking@Home, we also hope to perform virtual screens in the future using our CHARMM-based docking. This is a similarity between these two projects - in that we are both doing protein-ligand docking. However, a significant difference between Docking@home and other protein-ligand docking projects is that during the first phase of the project (first few years) we are going to focus on improving our docking methodology rather than focusing on specific applications. Our early focus is to continue to improve, develop and validate our CHARMM-based protein-ligand docking methods. Another signifigant difference is that Docking@Home uses quite different protein-ligand docking methodology than THINK. Also, from the computer science point of view, there are several interesting computer science objectives of the Docking@home project that involve dynamic allocation and scheduling of tasks that have differing needs for resource requirements and client reliability.
ID: 30017 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Armen
Avatar

Send message
Joined: 25 Oct 06
Posts: 4
Credit: 0
RAC: 0
Message 30018 - Posted: 25 Oct 2006, 23:24:17 UTC - in response to Message 29835.  

Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture?

AFAIK Docking is a kind of follow up of Predictor. At least some of the staff are the same: Team Docking vs. Team Predictor.

I'm no protein scientist, so it's up to those to decide about the sameness of the research, or better the (important) differences.


Thanks for the link Saenger,

My name is Roger S. Armen and I am the primary scientist directing the development and application of molecular docking techniques for the Docking@Home project. Since I am not part of the Rosetta@Home project, I should not speak for them. However, I can speak for Docking@Home, and one signifigant difference between these projects is that Docking@Home does not perform any protein folding calculations or any protein structure predictions. Docking@Home uses CHARMM-based molecular docking to predict the conformation of small drug-like molecules (also known as ligands) when they bind to a protein of known structure. We try to predict the structure of a protein-ligand complex. Another difference is that Rosetta@Home also performs some protein-protein docking applications, and Docking@Home does not intend
to do protein-protein docking (at least not for the early stages of the project).

Also, with regards to Docking and Predictor: Docking is not a "follow up" of
predictor, they are compleatly different projects with very different goals,
and different funding sources. The difference between Docking@Home and Predictor is that Predictor performs protein folding calcuations and protein structure predictions. As I outlined above Docking@Home does not do this. It is true that some of the personel does overlap. Dr. Michela Taufer helped establish the predictor project when she was at The Scripps Research Institute (TSRI), and now she has established her own project in her laboratory at University of Texas El Paso (UTEP) in collaboration with the Charlie Brooks Laboratory at TSRI.
ID: 30018 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Slappyto] popolito

Send message
Joined: 8 Mar 06
Posts: 13
Credit: 1,000,696
RAC: 475
Message 30070 - Posted: 26 Oct 2006, 20:50:45 UTC

Proteins@home allows to find the differents sequences of amino acids for a folding. The project goal is to calculate the energy fonctions for the differents sequences.
They call it the : the inverse problem of the folding.
http://biology.polytechnique.fr/proteinsathome/documentation.php

I'm sorry for my bad English :)
ID: 30070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 30084 - Posted: 27 Oct 2006, 3:13:27 UTC
Last modified: 27 Oct 2006, 3:18:04 UTC

If Docking@Home isn't using redundancy, could someone tell Dr Armen to fix up the BOINC credit problem before Docking@Home goes through the Rosetta@Home experience.

I'm sure that the Rosetta@Home team would be more than willing to explain why it is very important to fix it properly and early in the projects life and explain what the credit system ,they came up with, is here.

If it is using redundancy, less power to them :) sorry about the bad wordplay :p
ID: 30084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 30241 - Posted: 29 Oct 2006, 19:29:55 UTC - in response to Message 30084.  

If it is using redundancy, less power to them :) sorry about the bad wordplay :p

Every project is using redundancy, or it's plain random bills*** and not science.
Some do it by sending the same work to different participants (like Docking, Einstein, Malaria...), some do it somehow on the serverside (like Rosetta, CPDN). How it's done is secondary, but if it's not done it's not worth any power ;)
ID: 30241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 30243 - Posted: 29 Oct 2006, 20:08:52 UTC - in response to Message 30241.  
Last modified: 29 Oct 2006, 20:24:41 UTC

If it is using redundancy, less power to them :) sorry about the bad wordplay :p

Every project is using redundancy, or it's plain random bills*** and not science.
Some do it by sending the same work to different participants (like Docking, Einstein, Malaria...), some do it somehow on the serverside (like Rosetta, CPDN). How it's done is secondary, but if it's not done it's not worth any power ;)


There certainly has to be some protection against rogue results (results that are wrong whether by dishonesty, or over enthusiastic overclocking, or a faulty computer, etc)

Having agreed thus far, there is an important difference between redundancy and other forms of error control like building an ensemble (which we are doing here).

With redundancy (as the word is used by BOINC) exactly the same task is crunched by two different users.

With an ensemble, everyone does different tasks (for example with different starting conditions (CPDN) or a different set of random numbers (Rosetta)) and the end result is derived from the ensemble of all the different results in such a way that a few rogue results are unlikely to make any difference (or the end result can be independently checked.

The crucial difference is that with redundancy work is lost when two accurate and honest crunchers handle the same data; but the end result is the same whether there are a few rogues or not. (Rogue may mean cheating or may mean an unwitting problem).

With an ensemble the presence of a small number of rogue results would very slightly lower the overall accuracy, but to an insignificant extent but every accurate & honest client makes a unique contribution to the end result.

Two other approaches are worth noting.

Leiden is using both redundancy and an ensemble - each wu that goes into the ensemble is double crunched. This is undoubtedly the most accurate way of generating the end result given infinite resources, but whether they'd do better to have an ensemble twice as wide and no redundancy is another question.

Some maths projects have the ability to check a result more easily than calculating it. For example it is easier to check that a square root is correct than to calculate it. So a project to calculate square roots ould run every WU just once, and the validator would just square the delivered answer to see if it was right. (This is an imaginary project - but it illustrates the point).

On Rosetta, the lowest energy result is taken from the ensemble - and after all that crunching that is the one that matters. It is then easy for the team to check that that result is correct; if not they'd disqualify it and go to the next lowest. If a million decoys were run, this meqans that typically each result is only run 1.000001 times, or 1.000002 times on the rare event that some error is found in the first. And less still if there is a "short-cut" test like for the square roots.

Finally, not possible within BOINC, but popular in some other grid projects, is the idea of "random redundancy". Only some WU are double crunched, but the users are not told which ones. This means that deliberate cheats have to stay honest. When a discrepancy is found, several more wu from both users are checked. If one user is found to be generating more than a very very small number of errors, all their work (and all their credit) is discarded and those WU re-worked.

In my opinion the biggest weakness in BOINC was the decision to force the same degree of redundancy on every WU of a type. This is a reflection of the fact that on SETI at the time BOINC was first designed they had more computing power than they needed. Subsequent projects (including SETI when they get access to more input data) suffer from the fact that there is no degree of redundancy possible between 1 and 2.

(btw - I like BOINC overall or I would not be here. I also don't think it is perfect!)

For example, Zetagrid double crunched about 10% of all work, and less than 0.01% was found to be problematic, those runs then maybe generating 10x the work. This meant that each piece of work was run only 1.101 times, on average; compared to Einstein where the figure is over 2 (2 initial runs and more when needed) or LHC where is is over 5.

THis mattered on Zetagrid. We missed out on being first to a trillion "zeroes" by less than 10% - we were up to 910 billion when a mainframe project got the trillion. Had the project leader gone for redundancy a la BOINC we'd have been only just over half way, a huge difference. Sadly, the Zetagrid project was not suitable for an ensemble approach or we'd have got there in fron of the mainframe.

So yes, error checking of some sort is essential; but it is meaningful to talk of methods of error checking that do, or don't, involve redundancy. And it is certainly meaningful to talk of the degree of redundancy as a measure of what proportion of the work is devoted to error control.

River~~
ID: 30243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 30247 - Posted: 29 Oct 2006, 20:43:25 UTC

You must not mix redundancy and validation. You have to validate the results, otherwise their are totally worthless, and so you have to have redundancy on some level or the other.

If you have projects like CPDN or Rosetta, where a random seed starts a simulation, no real validation on the WU-level is needed. The whole process is extremely redundant as you crunch the same simulation thousands of thousands of times with just wee alterations.

In projects with a vast amount of data to be searched for something meaningful like the needle in the haystack (Einstein: gravitational waves, Seti: radio signals) that's simply impossible, here the better solution has to be validation on a per WU level.

There are probably projects that belong in both categories somehow. I don't know, but perhaps Leiden is one of those.

The stupid accusation of "wasted CPU-time" for a needed validation is just that: stupid! I can be discussed how the validation can be performed without too much redundanxcy, but it's better to have more redundancy then to have too little.
ID: 30247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Armen
Avatar

Send message
Joined: 25 Oct 06
Posts: 4
Credit: 0
RAC: 0
Message 30297 - Posted: 30 Oct 2006, 18:25:43 UTC - in response to Message 30084.  

If Docking@Home isn't using redundancy, could someone tell Dr Armen to fix up the BOINC credit problem before Docking@Home goes through the Rosetta@Home experience.

I'm sure that the Rosetta@Home team would be more than willing to explain why it is very important to fix it properly and early in the projects life and explain what the credit system ,they came up with, is here.

If it is using redundancy, less power to them :) sorry about the bad wordplay :p


Thanks for your suggestion hugo.

The members of the Docking@Home team are aware that this is a very important problem. I talked to our project administrator Andre, and he says that we are using redundancy. We require a quorum of 3 valid results before we assign credit. Because of the credit disparancies in boinc, we are planning to do something similar to Rosetta in which we assign credit for valid results or based on some other appropriate measure (independent of the boinc credit system).
ID: 30297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
thom217

Send message
Joined: 29 Oct 05
Posts: 12
Credit: 182
RAC: 0
Message 30472 - Posted: 1 Nov 2006, 21:41:39 UTC
Last modified: 1 Nov 2006, 21:43:29 UTC

I remember there was a gentleman who was in touch with Keith Davis, the head of the Find-a-Drug project, at the time of the project closure. He is one of the people responsible for running Chmoogle now called eMolecules search engine.

http://www.emolecules.com/
http://usefulchem.blogspot.com/2005/11/chmoogle.html

Jean-Claude Bradley
http://www.blogger.com/profile/6833158

He might be able to contribute to the Docking@Home database.
ID: 30472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : What about Docking@home and Proteins@home?



©2024 University of Washington
https://www.bakerlab.org