Rosetta@home Active WorkUnit(s) Log

Author	Message
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 14335 - Posted: 22 Apr 2006, 2:43:22 UTC Last modified: 22 Apr 2006, 2:45:26 UTC I'm going to start this log with a pretty complete list of proteins that we're looking at! First, there is a set of eleven "small" proteins (<80 residues) that David Baker and Divya have focused on for calibrating the Rosetta energy function. The information on all the proteins is extracted from the very useful PDBsum database at the European Bioinformatics Institute. You can access that database directly by clicking on each protein's code! 1b72 Protein/DNA Pbx1, homeobox protein hox-b1/DNA ternary complex Homeobox protein hox-b1. Homo sapiens. Human. Gene: hoxb-1. 1dcj Structural genomics, unknown function Solution structure of yhhp, a novel escherichia coli protein implicated in the cell division Yhhp protein. Escherichia coli. Bacteria. 1di2 RNA binding protein/RNA Crystal structure of a dsrna-binding domain complexed with dsrna: molecular basis of double-stranded RNA-protein interactions Double stranded RNA binding protein a. Xenopus laevis. 1dtj Immune system Crystal structure of nova-2 kh3 k-homology RNA-binding domain RNA-binding neurooncological ventral antigen 2. Homo sapiens. Human. Organ: brain. Cell: neuron. 1hz6 Protein binding Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution Protein l. Peptrostreptococcus magnus. Bacteria. 1mky Ligand binding protein Structural analysis of the domain interactions in der, a switch protein containing two gtpase domains Probable gtp-binding protein enga. Thermotoga maritima. Archebacteria. 1n0u Translation Crystal structure of yeast elongation factor 2 in complex with sordarin Elongation factor 2. Saccharomyces cerevisiae. Yeast 1ogw Chromosomal protein Synthetic ubiquitin with fluoro-leu at 50 and 67 Ubiquitin. Homo sapiens. Human 1r69 Gene regulating protein 434 repressor (amino-terminal domain) (r1-69) Phage 434 1tif Ribosome binding factor Translation initiation factor 3 n-terminal domain Translation initiation factor 3. Bacillus stearothermophilus. 2reb DNA binding protein The structure of the e. Coli reca protein monomer and polymer Reca protein (Escherichia coli) ID: 14335 · Rating: 0 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 14337 - Posted: 22 Apr 2006, 2:52:23 UTC Last modified: 22 Apr 2006, 6:00:07 UTC Around January 2006, David Kim and I began testing the Rosetta@home strategy on a diverse set of 62 proteins, whose crystal structures are well known. These protein sequences have been extensively used in Baker lab benchmarks for several years -- it has been gratifying to see Rosetta@home working on many of these sequences that we found challenging or even impossible with out in-house resources! See our Top Prediction page and David Baker's journal for examples. One more note, we've also been testing, on a grand scale, whether we can use information from homologs for each of these sequences. For example, in the 1fna protein fibronectin, our main sequence listed below comes from the famous bacterium E. coli. But there's a closely related sequence in chickens (amazing, huh?) that happens to fold better (i.e., to lower energies) in Rosetta. That's the one we've posted in the top predictions. We've been looking at a few other proteins not in these two sets, and we'll describe them -- along with their workunits -- above. 1a19 Ribonuclease inhibitor Barstar (free), c82a mutant Barstar. Chain: a, b. Engineered: yes. Mutation: c82a. Biological_unit: monomer Bacillus amyloliquefaciens. Expressed in: escherichia coli 1a32 Ribosomal protein Ribosomal protein s15 from bacillus stearothermophilus Ribosomal protein s15. Chain: null. Engineered: yes Bacillus stearothermophilus. Cellular_location: ribosome. Expressed in: escherichia coli. 1a68 Potassium channels Crystal structure of the tetramerization domain of the shaker potassium channel Potassium channel kv1.1. Chain: null. Fragment: tetramerization domain. Engineered: yes. Mutation: inserted met at n-terminus. Biological_unit: tetramer Aplysia californica. California sea hare. Tissue: central nervous system. Cellular_location: cytoplasm. Expressed in: escherichia coli. 1acf Contractile protein Acanthamoeba castellanii profilin ib Profilin i (Acanthamoeba castellanii) 1ail RNA-binding protein N-terminal fragment of ns1 protein from influenza a virus Nonstructural protein ns1. Chain: null. Fragment: RNA-binding domain Influenza a virus 1aiu Electron transport Human thioredoxin (d60n mutant, reduced form) Thioredoxin. Chain: null. Engineered: yes. Mutation: d60n. Other_details: active site cysteines 32 and 35 in the reduced form Homo sapiens. Human. Expressed in: escherichia coli. 1b3a Anti-HIV protein Total chemical synthesis and high-resolution crystal structure of the potent anti-HIV protein aop-rantes Rantes. Chain: a, b. Engineered: yes. Other_details: oxime link between aop group and pro3 Synthetic: yes 1bgf Transcription factor Stat-4 n-domain Stat-4. Chain: null. Fragment: n-terminal domain. Engineered: yes. Biological_unit: dimer Mus musculus. Mouse. Expressed in: escherichia coli. 1bk2 Sh3-domain A-spectrin sh3 domain d48g mutant A-spectrin. Chain: null. Fragment: sh3-domain. Engineered: yes. Mutation: d48g Gallus gallus. Chicken. Plasmid: pbat4. Expressed in: escherichia coli. 1bkr Actin-binding Calponin homology (ch) domain from human beta-spectrin at 1.1 angstrom resolution Spectrin beta chain. Chain: a. Fragment: f-actin binding domain residues 173 - 281. Synonym: calponin homology (ch) domain. Engineered: yes. Biological_unit: heterotetramer Homo sapiens. Human. Cell: non-erythrocyte. Cellular_location: cytoskeletal protein. Expressed in: escherichia coli. 1bm8 Cell cycle DNA-binding domain of mbp1 Transcription factor mbp1. Chain: null. Fragment: n-terminal DNA-binding domain. Synonym: mcb (mlui cell-cyle box) binding protein. Engineered: yes Saccharomyces cerevisiae. Baker's yeast. Plasmid: pet43 (c2681). Expressed in: escherichia coli. 1bq9 Metal binding protein Rubredoxin (formyl methionine mutant) from pyrococcus furiosus Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene 1bq9 Metal binding protein Rubredoxin (formyl methionine mutant) from pyrococcus furiosus Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene 1c8c DNA binding protein/DNA Crystal structures of the chromosomal proteins sso7d/sac7d bound to DNA containing t-g mismatched base pairs DNA-binding protein 7a. Chain: a. Synonym: sso7d, 7 kda DNA-binding protein d. 5'-d( Gp Tp Gp Ap Tp Cp Gp C)-3'. Chain: b, c. Engineered: yes Sulfolobus solfataricus. Collection: dsm 1617. Other_details: german collection of microorganisms (dsm) 1617, #1616. Synthetic: yes 1c9o Transcription Crystal structure analysis of the bacillus caldolyticus cold shock protein bc-csp Cold-shock protein. Chain: a, b. Synonym: cspb Bacillus caldolyticus 1cc8 Metal transport Crystal structure of the atx1 metallochaperone protein Metallochaperone atx1. Chain: a. Engineered: yes Saccharomyces cerevisiae. Baker's yeast. Expressed in: escherichia coli. 1cei Antibacterial protein Structure determination of the colicin e7 immunity protein (imme7) that binds specifically to the dnase-type colicin e7 and inhibits its bacteriocidal activity Colicin e7 immunity protein. Chain: null. Synonym: imme7 Escherichia coli 1cg5 Oxygen transport Deoxy form hemoglobin from dasyatis akajei Hemoglobin. Chain: a. Hemoglobin. Chain: b Dasyatis akajei. Akaei. Cell: erythrocyte. Cell: erythrocyte 1ctf Ribosomal protein 1.65 angstrom resolution structure of 7,8-dihydroneopterin aldolase from staphylococcus aureus 7,8-dihydroneopterin aldolase. Chain: null. Synonym: dhna. Engineered: yes. Other_details: octamer is crystallographic Staphylococcus aureus. Collection: atcc 25923. Gene: dhna. Expressed in: escherichia coli. 1e6i Gene regulation Bromodomain from gcn5 complexed with acetylated h4 peptide Transcriptional activator gcn5. Chain: a. Fragment: bromodomain. Engineered: yes. H4 peptide. Chain: p. Fragment: acetylated tail. Other_details: lysine 16 is acetylated on nz Saccharomyces cerevisiae. Gene: gcn5. Expressed in: escherichia coli. Synthetic: corresponds to residues 15-29 of histone h4. 1elw Chaperone Crystal structure of the tpr1 domain of hop in complex with a hsc70 peptide Tpr1-domain of hop. Chain: a, b. Fragment: n-terminal domain. Engineered: yes. Hsc70-peptide. Chain: c, d. Engineered: yes Homo sapiens. Human. Expressed in: escherichia coli. Synthetic: yes. Other_details: this sequence occurs naturally in humans 1enh DNA-binding protein Crystal structure of escherichia coli cyay protein reveals a novel fold for the frataxin family Cyay protein. Chain: a. Engineered: yes Escherichia coli. Bacteria. Expressed in: escherichia coli. 1eyv Transcription The crystal structure of nusb from mycobacterium tuberculosis N-utilizing substance protein b homolog. Chain: a, b. Synonym: nusb protein. Engineered: yes Mycobacterium tuberculosis. Bacteria. Expressed in: escherichia coli. 1fkb Isomerase Atomic structure of the rapamycin human immunophilin fkbp- 12 complex Fk506 binding protein (fkbp) complex with immunosuppressant rapamycin Human (homo sapiens) recombinant form expressed in (escherichia coli) 1fna Cell adhesion protein Gene v protein (single-stranded DNA binding protein) Gene v protein. Chain: null. Engineered: yes. Biological_unit: active as a dimer Escherichia coli. Strain: k561. Plasmid: ptt2. Gene: gen v in bacteriophage f1. Expressed in: escherichia coli. 1hz6 Protein binding Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution Protein l. Chain: a, b, c. Fragment: b1 domain. Synonym: ig kappa light chain-binding protein. Engineered: yes. Mutation: yes Peptrostreptococcus magnus. Bacteria. Expressed in: escherichia coli. 1ig5 Metal binding protein Bovine calbindin d9k binding mg2+ Vitamin d-dependent calcium-binding protein, intestinal. Chain: a. Synonym: calbindin d9k. Engineered: yes Bos taurus. Bovine. Gene: synthetic gene. Expressed in: escherichia coli. 1iib Phosphotransferase Crystal structure of iibcellobiose from escherichia coli Enzyme iib of the cellobiose-specific phosphotransferase system. Chain: a, b. Fragment: enzyme iib. Engineered: yes. Mutation: c10s. Biological_unit: monomer Escherichia coli. Strain: k12. Cellular_location: cytoplasm. Plasmid: pjl503. Gene: cela. Expressed in: escherichia coli. 1kpe Protein kinase inhibitor Pkci-transition state analog Protein kinasE C interacting protein. Chain: a, b. Synonym: pkci-1, protein kinasE C inhibitor 1, hint protein, hit protein. Engineered: yes. Biological_unit: dimer Homo sapiens. Human. Plasmid: phil-d5. Gene: hpkci-1. Expressed in: pichia pastoris. 1lis Fertilization protein Ribosomal protein s6 Ribosomal protein s6. Chain: a. Engineered: yes. Mutation: yes Thermus thermophilus. Bacteria. Expressed in: escherichia coli. 1nps Signaling protein Crystal structure of n-terminal domain of protein s Development-specific protein s. Chain: a. Fragment: n-terminal domain: motifs 1-2 and linker. Synonym: spore coat protein s. Engineered: yes Myxococcus xanthus. Bacteria. Cellular_location: cytosol. Expressed in: escherichia coli. 1opd Phosphotransferase Histidine-containing protein (hpr), mutant with ser 46replaced by asp (s46d) Histidine-containing protein. Chain: null. Synonym: hpr. Engineered: yes. Mutation: s46d Escherichia coli. Strain: esk108. Expressed in: escherichia coli. 1pgx Immunoglobulin binding protein Protein kinasE C delta cys2 domain Protein kinasE C delta type. Domain: cys2. Heterogen: zinc Mus musculus mouse. Expressed in: escherichia coli. 1r69 Gene regulating protein Crystal structure of subtilisin-propeptide complex Subtilisin e. Chain: a, b. Synonym: serine protease. Engineered: yes. Mutation: s221c Bacillus subtilis. Strain: 168. Expressed in: escherichia coli. 1shf Phosphotransferase Translation initiation factor 3 n-terminal domain Translation initiation factor 3. Chain: null. Domain: n-terminal residues 1 - 78. Synonym: if3-n. Engineered: yes. Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3). 1tig Ribosome binding factor Translation initiation factor 3 c-terminal domain Translation initiation factor 3. Chain: null. Domain: c-terminal residues 79 - 172. Synonym: if3-c. Engineered: yes Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3). 1tit Immunoglobulin-like domain Titin, ig repeat 27, nmr, minimized average structure Titin, i27. Chain: null. Synonym: connectin i27, titin ig repeat 27. Engineered: yes. Other_details: solution structure, t=308k, ph 4.5, 10mm acetate buffer Homo sapiens. Human. Organ: heart. Tissue: muscle. Organelle: sarcomere. Expressed in: escherichia coli. 1tul Telokin-like protein Structure of tlp20 Tlp20. Chain: null Autographa californica nuclear polyhedrosis virus, acmnpv. Baculovirus 1ubi Chromosomal protein Synthetic, structural and biological studies of the ubiquitin system: chemically synthesized and native ubiquitin fold into identical three-dimensional structures. Ubiquitin Solid-phase chemical synthesis 1ugh Glycosylase Crystal structure of human uracil-DNA glycosylase in complex with a protein inhibitor: protein mimicry of DNA Uracil-DNA glycosylase. Chain: e. Synonym: udg. Engineered: yes. Mutation: p82m, v83e, g84f. Biological_unit: monomer. Uracil-DNA glycosylase inhibitor. Chain: i. Synonym: ugi. Homo sapiens. Human. Expressed in: escherichia coli. Bacteriophage pbs2. Expressed in: escherichia coli 1urn Complex (ribonucleoprotein/RNA) U1a/RNA complex U1a spliceosomal protein. Chain: a, b, c. Fragment: residues 2 - 98. Engineered: yes. Mutation: y31h, q36r. RNA 21mer hairpin (5'-(ap Ap Up Cp Cp Ap Up Up Gp Cp Ap Cp Up Cp Cp Gp Gp Ap Up Up U)-3'). Chain: p, q, r other_details: u1a is a protein from u1 small Homo sapiens. Human. Cell_line: fetal brain cdna library. Expressed in: escherichia coli. Other_details: induction by t7 phi10 promoter. Synthetic: yes. Other_details: sequence based on hairpin ii of u1 RNA 1utg Steroid binding Amino terminal 9kda domain of vaccinia virus DNA topoisomerase i residues 1-77, experimental electron density for residues 1-77 DNA topoisomerase i. Chain: null. Fragment: amino terminal 9kda, residues 1 - 77. Engineered: yes. Other_details: domain generated by mild proteolysis of the intact 36kda vaccinia virus DNA topoisomerase i, a member of the eukaryotic-like type i DNA topoisomerases Vaccinia virus. Strain: wr. Expressed in: escherichia coli (active form) 1vie Oxidoreductase Structure of dihydrofolate reductase Dihydrofolate reductase. Chain: null. Synonym: r67 dhfr. Ec: 1.5.1.3 Escherichia coli. Strain: tmp-resistant, containing r67 dhfr overproducing plasmid plz1 1vls Chemotaxis Ligand binding domain of the wild-type aspartate receptor Aspartate receptor. Chain: null. Synonym: tar. Biological_unit: dimer Salmonella typhimurium 1who Allergen Allergen phl p 2 Allergen phl p 2. Chain: null. Synonym: phl p ii. Engineered: yes Phleum pratense. Timothy grass. Expressed in: escherichia coli 1wit Muscle protein Twitchin immunoglobulin superfamily domain (igsf module) (ig 18'), nmr, minimized average structure Twitchin 18th igsf module. Chain: null. Engineered: yes Caenorhabditis elegans. Nematode. Organ: body wall muscle. Tissue: muscle a-band. Cell: muscle cell. Expressed in: escherichia coli. 256b Electron transport Acyl-phosphatase (common type) from bovine testis Acylphosphatase. Chain: null. Synonym: acp. Biological_unit: monomer Bos taurus. Bovine. Organ: testis. Cellular_location: cytoplasm 2chf Signal transduction protein Refined structure of the actin-severing domain villin 14t, determined by solution nmr, minimized average structure Villin 14t. Chain: null. Fragment: residues 1 - 126. Synonym: villin domain 1, villin segment 1. Engineered: yes Gallus gallus. Chicken. Organ: intestine. Cell: epithelial cells. Expressed in: escherichia coli. 4ubp Hydrolase Structure of bacillus pasteurii urease inhibited with acetohydroxamic acid at 1.55 a resolution Urease (chain a). Chain: a. Synonym: urea aminohydrolase. Urease (chain b). Chain: b. Synonym: urea aminohydrolase. Urease (chain c). Chain: c. Synonym: urea aminohydrolase. Bacillus pasteurii. Strain: dsm 33. Cellular_location: cytoplasm. Cellular_location: cytoplasm 5cro Gene regulating protein Refined structure of cro repressor protein from bacteriophage lambda Cro repressor protein. Chain: o, a, b, c. Biological_unit: dimer. Other_details: water molecules and two phosphate radicals Bacteriophage lambda ID: 14337 · Rating: 2 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 14346 - Posted: 22 Apr 2006, 6:20:22 UTC Last modified: 22 Apr 2006, 6:33:18 UTC We know that some of the workunits that show up on your computer have cryptic names -- all the jobs are testing some interesting scientific tricks, and we'd like to tell you what they are! We're going to make an effort to post information in this thread about every workunit that we send out. Then you'll hear about results on our Top Prediction page and in David Baker's journal. Here are explanations of some of the workunits that may be showing up on your clients: 1. Testing a smart resampling strategy: FARELAX_NOFILTERS_xxxx FACONTACTS_RECENTER_NOFILTERS_xxxx Have you ever wondered whether there's a better way to fold proteins than have 100,000 clients do completely independent runs, without talking to each other? These workunits are testing a new strategy that may be smarter. We look at the full-atom scores and conformations from the first 10,000 models (that's the first workunit). For generating the second round of 10,000 models we then adjust the initial stages of the search (which uses a low resolution score function) to favor contacts that gave low energies in the first round. 2. The hardest protein. Within our benchmark set, there's one protein 1tul_ with a particularly complicated fold, called the telokin-like fold. It has contacts from very distant parts of the protein chain, which arise infrequently in Rosetta (and presumably in Nature, too, but that's another story). We're trying to concentrate the resources of Rosetta@home to see whether we can get anywhere close to the right answer with massive sampling: about 10 million conformations compared to the traditional ~1000 conformations we would run in-house. PROD_ABINITIO_FAST_1tul_ PROD_ABINITIO_1tul_ These jobs use the standard ab initio protocol. The first one uses a tenth the number of moves -- we're trying to see if we can get away with doing less per simulation, and instead running more simulations. PROD_ABINITIO_ALPHABETABAR_1tul_ We're testing a strategy developed by Phil Bradley, the folding guru in our lab, where Rosetta avoids certain over-common motifs (beta hairpins). This is an effort to explore more extreme parts of the conformational space. PROD_ABINITIO_9FULLSTRANDBAR_1tul_ PROD_ABINITIO_9STRANDBAR_1tul_ PROD_ABINITIO_7STRANDBAR_1tul_ These runs have different "barcodes" that specify non-native protein contacts that shouldn't be made during the simulation. The question is: how much does the extra information help Rosetta find the native conformation? 3. More aggressive full atom sampling: HBLR_1.0_xxxx_ROT_TRIALS_TRIE The final stage Rosetta's folding strategy consists of fine movements that try to fit the protein pieces togeth iner atomic detail (the "fullatom" stage, often abbreviated FA). These simulations use David Baker's latest energy terms (the "HBLR_1.0" refers to the weight on long-range hydrogen bonding) using an aggresive minimization protocol ("rotamer trials") that is made efficient with a neat graph representation within rosetta (the "trie"). 4. Helical proteins from viruses VP_TEST_core_vp26_ VP_TEST_truncate_termini_vp26_ VP_TEST_vp26_ VP_TEST_truncate_termini_1qgtA VP_TEST_1qgtA We're always trying to find new ways to couple Rosetta with experimental data. We're beginning a project to use low-resolution images made by cryo-electron microscopy to constrain Rosetta's search. Our collaborators in Wah Chiu's lab happen to be studying the proteins that form virus coats ("VP" stands for virus proteins), and these are some early jobs to begin testing the protocol. You'll see more of these in the fall, after we've been through the CASP structure prediction trials. ID: 14346 · Rating: 1 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 14722 - Posted: 27 Apr 2006, 4:38:16 UTC Last modified: 27 Apr 2006, 4:39:11 UTC We're beginning to run some new jobs. As some of you know, the CASP7 blind trials are coming up, from May to August 2006. These are the Critical Assessment of Structure Prediction experiments, held every two years. Many of the protein folding groups around the world test their methods on sequences whose experimental crystal structures have recently been solved (often with arduous effort!) but will be kept hidden by the experimenters until the fall. So this is a true blind trial. As part of our preparation, we're running some of the targets from the previous CASP6 trials (held in 2004). Look for workunits with names like the following: AB_CASP6_t242_ AB_CASP6_t272_ ... We're excited to see whether the massive computational power of Rosetta@home will give us an advantage over our methodology from 2004. We're strongly betting that it will! ID: 14722 · Rating: 0 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 15060 - Posted: 30 Apr 2006, 2:37:14 UTC I'm about to post some results in top predictions from our attempts to solve a very difficult protein ("1tul_") that I described before. It turns out that even after well over one million simulations we're not quite close enough -- even if we cheat and disallow non-native strand pairings! That's very good to know. So we're trying a new idea, invented by Phil Bradley in our group, called "jumping". In this method, we assume that we know two parts of the protein chain are in contact, maybe as a guess or maybe because we have some outside source of information. These two parts of the chain are stuck together for the whole simulation. To allow some mobility in the chain between the contact points, we then put a cut in the chain. (Throughout the simulation, we try not to let this cut widen too much by penalizing the score for large chainbreaks.) This ends up being a pretty efficient way to search for low-energy protein conformations with failry complex topologies. We'll be runnning two kinds of workunits: JUMPTEST_1tul_ JUMP_ALLBARCODEXX_1tul_ The first kind uses information on 7 known strand pairings in 1tul_ -- in other words, it's a cheat. In the second kind of runs, there's a comprehensive list of all possible topologies of strand pairings (about 100,000 for this protein!), and every client chooses one topology to test per simulation. So some clients will investigate topologies that look like "sandwiches", others will try to make "barrels", and so forth. Thanks to Rosetta@home, we can sample all these topologies comprehensively for the first time! ID: 15060 · Rating: 0 · rate: /

Bin Qian Send message Joined: 13 Jul 05 Posts: 33 Credit: 36,897 RAC: 0	Message 15302 - Posted: 2 May 2006, 15:32:46 UTC - in response to Message 15060. Last modified: 2 May 2006, 15:35:15 UTC I've just sent out some work units with names starting with: HOMO_xxxx_h0xx_1_LOOPRLX_ You all have been well informed by David(s) and Rhiju about the ab-initio approach they are using to fold a protein from its amino-acid sequence. I'm working on another category of the protein structure prediction problem, namely the "comparative modeling" approach. Here is how this approach works: when we want to predict the structure of a protein, we usually first do a database search on the available protein structures. Often times we can find that the protein we want to predict has a brother/sister (called a homologous protein) with its structure solved by one of the accurate but time-consuming experimental methods. It is well known that for a pair of homologous proteins, they share more or less similar shapes. This important information can give us a "short cut" towards solving our target protein's structure because now we can jump-start our coarse-grained conformational search with the homologous protein's structure, and only search parts of the structure that are either missing or likely to be different from the homologous protein structure. This coarse-grained search is followed by a fine-grained search in which we are trying to locate the precise positions of a protein's hundreds of thousands of atoms (called fullatom relax stage) based on its rough shape determed by the coarse-grained search. This is the same fullatom relax stage as the second step in David(s) and Rhiju's approach. On the screen saver you will likely see the WU starts with a compact protein structure (the homologous structure), then some parts of the structure start to move around. After they settle down, the whole protein will start to wiggle with smaller scale motions. ID: 15302 · Rating: 0 · rate: /

divyab Send message Joined: 20 Oct 05 Posts: 6 Credit: 0 RAC: 0	Message 16119 - Posted: 13 May 2006, 0:59:37 UTC Last modified: 13 May 2006, 1:00:02 UTC I have just added two new workunits to the queue: CASP_HOMOLOG_ABRELAX_hom001_t287_ and HOMOLOG_ABRELAX_hom0xx_t283_ These workunits are both for CASP, the competition Rhiju mentioned below. t278 and t283 are both sequences for which we are trying the "ab initio" approach, meaning that they will not be based on existing structures (unlike the case Bin described below). The workunits do, however, use homologs of the sequence (other proteins with similar amino acids sequences) to help in the prediction! here's how: we take the target sequence (given by CASP) and make the best possible predictions we can. We then find sequence homologs in the database, and make the best possible predictions we can for those also. The basis of this, is that sequences that have high homology can be expected to have the same structure, so if we find a good prediction among any of the homologs, we have solved the structure for our target sequence! These workunits are specifically crunching away at all the homologs, folding them independantly and trying to find the best structure for each. The next step will be to map our target sequence back onto all these different structures, rebuild gaps where the two sequences may be different sizes, and do some simple tweaking of the sidechains that are different. This will yield our final predictions. ID: 16119 · Rating: 1 · rate: /

mnb Send message Joined: 15 Dec 05 Posts: 51 Credit: 69,458 RAC: 0	Message 22267 - Posted: 11 Aug 2006, 12:43:02 UTC 1yrf Structural protein Chicken villin subdomain hp-35, n68h, ph6.7 Villin. Chain: a. Fragment: vhp. Engineered: yes. Mutation: yes Synthetic: yes. Other_details: sequence corresponds to chicken villin residues 792-826 list of my results ID: 22267 · Rating: 0 · rate: /

bblum Send message Joined: 15 Aug 06 Posts: 6 Credit: 4,077 RAC: 0	Message 22497 - Posted: 15 Aug 2006, 21:15:33 UTC Last modified: 15 Aug 2006, 21:16:12 UTC I've added three new jobs to the queue: 1di2__LARS_ABRELAX_SAVE_ALL_OUT_BARCODE_ 1di2__CHEAT_ABRELAX_SAVE_ALL_OUT_BARCODE_ 1di2__CONTROL_ABRELAX_SAVE_ALL_OUT_ All are "abrelax" runs (ab initio prediction followed by relaxation, as described below by Bin and Rhiju--graphically speaking, big wiggles followed by small wiggles). The protein is 1di2A. We've already generated a population of decoys for 1di2A, and now we're attempting to use what we've learned from that population to generate a much better population. It's frequently clear from an initial sampling run that certain decoy features are always good--they tend, if present, to give rise to low energy. If we can identify these features, then we can concentrate the next round of sampling on them. We've generally found that features which correlate with low energy in the initial sampling round tend to be present in the native. Features can include both low-level ones (e.g. torsion angles or rotamers for specific residues) and high-level ones (e.g. beta strand pairings). In these jobs, we're only enriching for low-level features. The job I'm most excited about is the LARS job; here we've used a sparse linear model to identify a few features that correlated very strongly with the Rosetta energy, and I've enriched for all of them in sampling. The CHEAT job fixes a single linchpin native feature that we hope will result in sampling much closer to the native; it will allow us to test a hypothesis we have about the joint distribution of features in Rosetta sampling. The CONTROL job fixes no features and will serve as a baseline. ID: 22497 · Rating: 1 · rate: /

James Thompson Send message Joined: 13 Oct 05 Posts: 46 Credit: 186,109 RAC: 0	Message 22568 - Posted: 16 Aug 2006, 18:43:00 UTC - in response to Message 22497. I've added 58 small jobs to the queue named <protein_name>__CASP7_ABINITIO_SAVE_ALL_OUT_flatss0.33_. These jobs are designed to investigate how problems with secondary structure prediction affect our predictions. Secondary structure prediction is an initial step in our methodology that attempts to predict a local arrangements of a small piece of the protein structure, which Rosetta then arranges into an overall tertiary structure. My impression from CASP is that lack of knowledge of secondary structure makes prediction very difficult, and these workunits will give a better idea of the problem's difficulty. ID: 22568 · Rating: 1 · rate: /

James Thompson Send message Joined: 13 Oct 05 Posts: 46 Credit: 186,109 RAC: 0	Message 26411 - Posted: 9 Sep 2006, 0:17:07 UTC Over the next two days I'll be adding some new jobs to the Rosetta@Home queue that will attempt to use a slightly modified methodology for ab initio prediction. The jobs look like this: 2vik__BARCODE_SEARCH_BARCODE_FROM_FRAGS_ABINITIO_barcode_from_frags That batch of workunits (and three others) are currently running on Ralph. We use a method called a barcode to constraint certain residues of a protein to adopt a specific conformation. This barcode ensures that the conformation stays fixed throughout an entire run. The barcode_from_frags tries to infer the residues for trying a barcode by examining the distribution of conformations for analogous sections of proteins of known structure. And yes, I know that barcode_from_frags is in there twice and I'll try to fix that before my runs on Rosetta@Home commence. :) ID: 26411 · Rating: 3 · rate: /

Possu Volunteer moderator Project developer Project scientist Send message Joined: 20 Sep 06 Posts: 4 Credit: 27,635 RAC: 0	Message 27652 - Posted: 20 Sep 2006, 6:06:37 UTC We are now using Rosetta@Home to screen our HIV vaccine designs. We are remodeling the protein called GP120, which is responsible for binding with human proteins and initiate the invasion to host cells. The remodeled/redesigned GP120 can potentially be used as a vaccine. (Please see David's journal for more info) The jobs submitted for this will have a prefix "PSH" and future jobs will also carry the name(s) GP120 and OD1 (stands for the outer domain of the GP120) for easy identification. We are using the jobs on Rosetta@Home to refold and refine the sequences came from design runs. We first remodel the backbone using rosetta (~2000+ different backbone conformations), design them to put suitable sequences on for each of the conformations, and then score all these different sequences to find the best ones to test experimentally. The scoring process is very time consuming as each of the 2000+ starting conformations will have to be sampled thoroughly in its local conformational space before we can call definitively which is the best designed sequence. That's when Rosetta@Home comes to the rescue... ID: 27652 · Rating: 1 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 29252 - Posted: 12 Oct 2006, 22:28:34 UTC Over the next week or so, David Baker and I will be testing out a new protocol for full atom refinement that should more efficiently hit lower energies! The workunits in this new study will have names like: 2tif__BOINC_NEWRELAXFLAGS_ABRELAX_SAVE_ALL_OUT_ 2tif__BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT_ ID: 29252 · Rating: 0 · rate: /

Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0	Message 29262 - Posted: 13 Oct 2006, 1:42:36 UTC All the docking WUs will have names like "DOC_" and there will be two proteins colored in blue and red shown on the screensaver. Compared to "RELAX" WUs, docking WUs normally have smaller number of steps and each step takes longer time. Also, since rmsd is calculated different in docking, they may have rather larger number values. ID: 29262 · Rating: 0 · rate: /

James Thompson Send message Joined: 13 Oct 05 Posts: 46 Credit: 186,109 RAC: 0	Message 30090 - Posted: 27 Oct 2006, 9:07:44 UTC Starting tomorrow we will be sending out our first workunits for our newest ab-initio structure prediction project. These workunits will look like this: s001__BOINC_ABRELAX_SAVE_ALL_OUT_hom001_ We are collaborating with a number of structural genomics centers so that we will attempt to predict structures whose structures will be solved within the next six months. This project is very obviously inspired by the CASP competitions, and some of us in the lab have started calling this "CASP all-the-time." It will be very useful for us to have a running benchmark of our methods on absolutely new crystal structures. ID: 30090 · Rating: 0 · rate: /

Brian Kidd Send message Joined: 9 Dec 06 Posts: 5 Credit: 327 RAC: 0	Message 32315 - Posted: 9 Dec 2006, 9:33:06 UTC I recently submitted a job (BAK_1avs_TnC_loop_model) to predict the structural changes in an allosteric protein called Troponin C. This protein regulates muscle contraction and has important health consequences for cardiovascular disease. In the near future, I will be submitting more workunits related to predicting conformational changes in allosteric proteins. For each submission, I'll describe a little about the protein's function and it's relevance to medicine or technology. As for allosteric proteins, they are particularly interesting cases for structure prediction because their function is regulated by structural changes that take place far from the active site. As a consequence these proteins have multiple functional states and we are attempting to predict the structures associated with these states. Our basic approach is to exhaustively sample the conformational away from the starting structure and look for additional energy minima. Thanks for your help in this important problem. ID: 32315 · Rating: 0 · rate: /

Brian Kidd Send message Joined: 9 Dec 06 Posts: 5 Credit: 327 RAC: 0	Message 32317 - Posted: 9 Dec 2006, 10:54:50 UTC Last modified: 9 Dec 2006, 10:56:47 UTC A new job was just submitted - BAK_1klf_FimH_loop_model. This protein is a bacterial adhesion protein called FimH. The protein resides on the tip of projections that pathogenic e. coli extrude from their cell wall to stick to glycoproteins on the surface of human cells. The neat feature of this protein is that its binding is regulated by force, and the bond lifetime actually increases as force increases, called a "catch bond". This catch-like behavior is counter intuitive and contrary to how most bonds function - most bonds decrease their lifetime if you yank on them with more force. Predicting the structures of both states, active and inactive, would have huge implications for treating bacterial infections and developing novel molecular force sensors. FimH is an allosteric protein and the structure of the unbound - "inactive" - conformation has been technically difficult to solve by experimental methods. However, we ought to be able to predict the inactive structure with the loop modeling and full-atom relax techniques that we've been adapting to look at allosteric proteins. In addition, some recent work by our collaborators has given us reason to believe that there will be a NMR structure in the not too distant future. This experimental data will be a great validation for the predicted structures that you all are solving with Rosetta@home. Thanks so much for your support in predicting this structure. ID: 32317 · Rating: 0 · rate: /

Mod.Zilla Volunteer moderator Send message Joined: 5 Sep 06 Posts: 423 Credit: 6 RAC: 0	Message 38063 - Posted: 20 Mar 2007, 21:31:46 UTC Last modified: 20 Mar 2007, 21:38:57 UTC More details from Chu on Protein-protein docking and CAPRI. More details from Rhiju on RNA folding Rosetta Informational Moderator: Mod.Zilla ID: 38063 · Rating: 0 · rate: /

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 38164 - Posted: 23 Mar 2007, 5:43:40 UTC Last modified: 23 Mar 2007, 5:44:27 UTC The work units labeled RNA_ABINITIO model folding of small chains of RNA, a molecule similar to DNA. They start from extended chains and then try to maximize the pairing and stacking of the RNA sidechains ("bases"). Info on where the sequences come from: 1a4d E-loop from ribosome 5S RNA 1esy HIV psiRNA dimerization signal 1kka tRNA anticodon stem loop 1l2x and 2a43 Viral frameshifting pseudoknots 1qwa Nucleolin binding hairpin 1q9a Sarcin/ricin loop from the large ribosomal subunit 1zih GNRA tetraloop 2f88 Domain I from the self-splicing Group II intron 1xjr Rigorously conserved RNA element from SARS 1ehz tRNA(phe) from yeast 1gid The P4-P6 domain from the self-splicing Tetrahymena group I ribozyme ID: 38164 · Rating: 0 · rate: /

Ingemar Send message Joined: 28 Feb 06 Posts: 20 Credit: 1,680 RAC: 0	Message 38199 - Posted: 24 Mar 2007, 0:31:21 UTC Jobs with names DOCKING_rhj is running protein-protein docking on dimers where the individual monomers are related by symmetry. The structures are coming from ab-initio structure prediction. The structure of this protein could not be solved by standard techniques used for determining crystal structures. Although a crystal could be grown and experimental data looks good one of the procdures in crystal structure determination, phasing, could not succesfully be carried out starting from similar proteins previously solved. The idea here is to create models with ab-intio+docking that can be used as starting points for the phasing procedure. If this works this could be way of rescuing data from x-ray crystallography data from biologically important proteins that can not be converted to 3D-structures, which would a significant breakthrough. Thanks for your help! ID: 38199 · Rating: 1 · rate: /