Posts by divyab

1) Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log (Message 16119)
Posted 13 May 2006 by divyab
Post:
I have just added two new workunits to the queue:

CASP_HOMOLOG_ABRELAX_hom001_t287_ and
HOMOLOG_ABRELAX_hom0xx_t283_

These workunits are both for CASP, the competition Rhiju mentioned below. t278 and t283 are both sequences for which we are trying the "ab initio" approach, meaning that they will not be based on existing structures (unlike the case Bin described below). The workunits do, however, use homologs of the sequence (other proteins with similar amino acids sequences) to help in the prediction! here's how:

we take the target sequence (given by CASP) and make the best possible predictions we can. We then find sequence homologs in the database, and make the best possible predictions we can for those also. The basis of this, is that sequences that have high homology can be expected to have the same structure, so if we find a good prediction among any of the homologs, we have solved the structure for our target sequence!

These workunits are specifically crunching away at all the homologs, folding them independantly and trying to find the best structure for each. The next step will be to map our target sequence back onto all these different structures, rebuild gaps where the two sequences may be different sizes, and do some simple tweaking of the sidechains that are different. This will yield our final predictions.
2) Message boards : Rosetta@home Science : Comments/questions on Rosetta@home journal (Message 12011)
Posted 14 Mar 2006 by divyab
Post:


A rotating daily Top Prediction of the Day (TPD), similar to user of the day, would probably be neat. (I would not be opposed to replacing user of the day with top prediction of the day, but I am not sure how others would feel about that)



We are now posting these in the "news" section of the homepage! As David mentioned in his journal, we are looking into pritable certificates. We are also looking into adding "top predictions" to people's profiles. I'd appreciate any feedback on the format of the daily top predictions posts!
3) Message boards : Number crunching : Miscellaneous Work Unit Errors (Message 11804)
Posted 9 Mar 2006 by divyab
Post:
We have found the problem, and are resubmitting the jobs with a fix. There are still a few workunits with the following prefix out there that you can expect to fail very quickly:

HOMSdt_homDB0??_1dtj

this should not happen with the next batch.
4) Message boards : Number crunching : Please abort WUs with (Message 7503)
Posted 24 Dec 2005 by divyab
Post:

Edit: And this ... rather silly but I was wondering ...
Q6) I have one WU with the name ... BARCODE_FRAG_30_1n0u_221_42_0 ...,
is it related to the 3-dimensional shapes of proteins research to find cures for some major human diseases?



(in the future, science questions like this will probably be more promptly addressed on one of the science threads...but since it seems like the WU's are stabalizing, i'll answer here....)

Barcode refers to a particular method we use when we try and accurately predict the protein's structure, as you guessed above. basically, we use this as a way to make sure that we are not missing some particular "features" when we are searching for the correct structure. a "barcode" might be for some particular feature (lets say, a kink in the chain), and has different "flavors" (kink at the beginning, kink in the middle, kink at the end, all 3, etc.). in the runs that say "barcode", we spread our search so that all the different flavors of certain features are evaluated before making our predictions.

5) Message boards : Rosetta@home Science : What we have learned thus far (Message 1554)
Posted 20 Oct 2005 by divyab
Post:
In addition to the CSA method Mike described below, we are also developing other algorithms that address some of the sampling concerns you mentioned below.

Currently, if Rosetta is to generate 10,000 structures, each of those 10,000 runs starts at a random point in the confomational space. Monte Carlo and then some local optimization is performed, and the structure at the local minimum is reported. Instead, what we would like to do, is use the information from the first 5,000 runs to guide the sampling for the next 5,000 runs. Thus we will use the information in the local minima to help guide us toward a global minimum, or at the very least, find areas that either appear promising, or are undersampled.

The protein conformational space is extremely high dimensional - too high to feasibly globally optimize. We are thus reducing the dimensionality using Principle Component Analysis, and attempting to fit simple energy surfaces (parametric and non-parametric) to this reduced dimensional space. Minimizing these fitted functions give us new areas to sample. Additionally, we can look at this reduced dimensional space and look for areas in which sampling is very low. We can then identify these areas for further sampling.






©2024 University of Washington
https://www.bakerlab.org