Rosetta@home Active WorkUnit(s) Log

Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log

To post messages, you must log in.

1 · 2 · Next

Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 14309 - Posted: 21 Apr 2006, 22:15:57 UTC
Last modified: 21 Apr 2006, 22:18:09 UTC

This thread is set aside as a place for the project to keep a public log of the Work Units being processed, and to provide some information about each type.

Posts on other topics or discussion post will be moved to other locations.

For information related to the current Rosetta@home application release and a version history please see this thread.

Moderator Contact
ID: 14309 · Rating: 3 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14335 - Posted: 22 Apr 2006, 2:43:22 UTC
Last modified: 22 Apr 2006, 2:45:26 UTC

I'm going to start this log with a pretty complete list of
proteins that we're looking at! First, there is a set
of eleven "small" proteins (<80 residues) that David Baker
and Divya have focused on for calibrating the Rosetta
energy function.

The information on all the proteins is extracted
from the very useful PDBsum database at the European
Bioinformatics Institute. You can access that
database directly by clicking on each protein's code!

Pbx1, homeobox protein hox-b1/DNA ternary complex
Homeobox protein hox-b1.
Homo sapiens. Human. Gene: hoxb-1.

Structural genomics, unknown function
Solution structure of yhhp, a novel escherichia coli protein implicated in the cell division
Yhhp protein.
Escherichia coli. Bacteria.

RNA binding protein/RNA
Crystal structure of a dsrna-binding domain complexed with dsrna: molecular basis of double-stranded RNA-protein interactions
Double stranded RNA binding protein a.
Xenopus laevis.

Immune system
Crystal structure of nova-2 kh3 k-homology RNA-binding domain
RNA-binding neurooncological ventral antigen 2.
Homo sapiens. Human. Organ: brain. Cell: neuron.

Protein binding
Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution
Protein l.
Peptrostreptococcus magnus. Bacteria.

Ligand binding protein
Structural analysis of the domain interactions in der, a switch protein containing two gtpase domains
Probable gtp-binding protein enga.
Thermotoga maritima. Archebacteria.

Crystal structure of yeast elongation factor 2 in complex with sordarin
Elongation factor 2.
Saccharomyces cerevisiae. Yeast

Chromosomal protein
Synthetic ubiquitin with fluoro-leu at 50 and 67
Homo sapiens. Human

Gene regulating protein
434 repressor (amino-terminal domain) (r1-69)
Phage 434

Ribosome binding factor
Translation initiation factor 3 n-terminal domain
Translation initiation factor 3.
Bacillus stearothermophilus.

DNA binding protein
The structure of the e. Coli reca protein monomer and polymer
Reca protein
(Escherichia coli)

ID: 14335 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14337 - Posted: 22 Apr 2006, 2:52:23 UTC
Last modified: 22 Apr 2006, 6:00:07 UTC

Around January 2006, David Kim and I began testing the Rosetta@home
strategy on a diverse set of 62 proteins, whose crystal structures
are well known. These protein sequences have been extensively
used in Baker lab benchmarks for several years -- it has been
gratifying to see Rosetta@home working on many of these sequences
that we found challenging or even impossible with out in-house
resources! See our Top Prediction page and David Baker's journal for examples.

One more note, we've also been testing, on a grand scale,
whether we can use information from homologs for each of these
sequences. For example, in the 1fna protein fibronectin, our main
sequence listed below comes from the famous bacterium E. coli.
But there's a closely related sequence in chickens (amazing, huh?)
that happens to fold better (i.e., to lower energies) in Rosetta.
That's the one we've posted in the top predictions.

We've been looking at a few other proteins not in these two sets, and
we'll describe them -- along with their workunits -- above.

Ribonuclease inhibitor
Barstar (free), c82a mutant
Barstar. Chain: a, b. Engineered: yes. Mutation: c82a. Biological_unit: monomer
Bacillus amyloliquefaciens. Expressed in: escherichia coli

Ribosomal protein
Ribosomal protein s15 from bacillus stearothermophilus
Ribosomal protein s15. Chain: null. Engineered: yes
Bacillus stearothermophilus. Cellular_location: ribosome. Expressed in: escherichia coli.

Potassium channels
Crystal structure of the tetramerization domain of the shaker potassium channel
Potassium channel kv1.1. Chain: null. Fragment: tetramerization domain. Engineered: yes. Mutation: inserted met at n-terminus. Biological_unit: tetramer
Aplysia californica. California sea hare. Tissue: central nervous system. Cellular_location: cytoplasm. Expressed in: escherichia coli.

Contractile protein
Acanthamoeba castellanii profilin ib
Profilin i
(Acanthamoeba castellanii)

RNA-binding protein
N-terminal fragment of ns1 protein from influenza a virus
Nonstructural protein ns1. Chain: null. Fragment: RNA-binding domain
Influenza a virus

Electron transport
Human thioredoxin (d60n mutant, reduced form)
Thioredoxin. Chain: null. Engineered: yes. Mutation: d60n. Other_details: active site cysteines 32 and 35 in the reduced form
Homo sapiens. Human. Expressed in: escherichia coli.

Anti-HIV protein
Total chemical synthesis and high-resolution crystal structure of the potent anti-HIV protein aop-rantes
Rantes. Chain: a, b. Engineered: yes. Other_details: oxime link between aop group and pro3
Synthetic: yes

Transcription factor
Stat-4 n-domain
Stat-4. Chain: null. Fragment: n-terminal domain. Engineered: yes. Biological_unit: dimer
Mus musculus. Mouse. Expressed in: escherichia coli.

A-spectrin sh3 domain d48g mutant
A-spectrin. Chain: null. Fragment: sh3-domain. Engineered: yes. Mutation: d48g
Gallus gallus. Chicken. Plasmid: pbat4. Expressed in: escherichia coli.

Calponin homology (ch) domain from human beta-spectrin at 1.1 angstrom resolution
Spectrin beta chain. Chain: a. Fragment: f-actin binding domain residues 173 - 281. Synonym: calponin homology (ch) domain. Engineered: yes. Biological_unit: heterotetramer
Homo sapiens. Human. Cell: non-erythrocyte. Cellular_location: cytoskeletal protein. Expressed in: escherichia coli.

Cell cycle
DNA-binding domain of mbp1
Transcription factor mbp1. Chain: null. Fragment: n-terminal DNA-binding domain. Synonym: mcb (mlui cell-cyle box) binding protein. Engineered: yes
Saccharomyces cerevisiae. Baker's yeast. Plasmid: pet43 (c2681). Expressed in: escherichia coli.

Metal binding protein
Rubredoxin (formyl methionine mutant) from pyrococcus furiosus
Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes
Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene

Metal binding protein
Rubredoxin (formyl methionine mutant) from pyrococcus furiosus
Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes
Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene

DNA binding protein/DNA
Crystal structures of the chromosomal proteins sso7d/sac7d bound to DNA containing t-g mismatched base pairs
DNA-binding protein 7a. Chain: a. Synonym: sso7d, 7 kda DNA-binding protein d. 5'-d( Gp Tp Gp Ap Tp Cp Gp C)-3'. Chain: b, c. Engineered: yes
Sulfolobus solfataricus. Collection: dsm 1617. Other_details: german collection of microorganisms (dsm) 1617, #1616. Synthetic: yes

Crystal structure analysis of the bacillus caldolyticus cold shock protein bc-csp
Cold-shock protein. Chain: a, b. Synonym: cspb
Bacillus caldolyticus

Metal transport
Crystal structure of the atx1 metallochaperone protein
Metallochaperone atx1. Chain: a. Engineered: yes
Saccharomyces cerevisiae. Baker's yeast. Expressed in: escherichia coli.

Antibacterial protein
Structure determination of the colicin e7 immunity protein (imme7) that binds specifically to the dnase-type colicin e7 and inhibits its bacteriocidal activity
Colicin e7 immunity protein. Chain: null. Synonym: imme7
Escherichia coli

Oxygen transport
Deoxy form hemoglobin from dasyatis akajei
Hemoglobin. Chain: a. Hemoglobin. Chain: b
Dasyatis akajei. Akaei. Cell: erythrocyte. Cell: erythrocyte

Ribosomal protein
1.65 angstrom resolution structure of 7,8-dihydroneopterin aldolase from staphylococcus aureus
7,8-dihydroneopterin aldolase. Chain: null. Synonym: dhna. Engineered: yes. Other_details: octamer is crystallographic
Staphylococcus aureus. Collection: atcc 25923. Gene: dhna. Expressed in: escherichia coli.

Gene regulation
Bromodomain from gcn5 complexed with acetylated h4 peptide
Transcriptional activator gcn5. Chain: a. Fragment: bromodomain. Engineered: yes. H4 peptide. Chain: p. Fragment: acetylated tail. Other_details: lysine 16 is acetylated on nz
Saccharomyces cerevisiae. Gene: gcn5. Expressed in: escherichia coli. Synthetic: corresponds to residues 15-29 of histone h4.

Crystal structure of the tpr1 domain of hop in complex with a hsc70 peptide
Tpr1-domain of hop. Chain: a, b. Fragment: n-terminal domain. Engineered: yes. Hsc70-peptide. Chain: c, d. Engineered: yes
Homo sapiens. Human. Expressed in: escherichia coli. Synthetic: yes. Other_details: this sequence occurs naturally in humans

DNA-binding protein
Crystal structure of escherichia coli cyay protein reveals a novel fold for the frataxin family
Cyay protein. Chain: a. Engineered: yes
Escherichia coli. Bacteria. Expressed in: escherichia coli.

The crystal structure of nusb from mycobacterium tuberculosis
N-utilizing substance protein b homolog. Chain: a, b. Synonym: nusb protein. Engineered: yes
Mycobacterium tuberculosis. Bacteria. Expressed in: escherichia coli.

Atomic structure of the rapamycin human immunophilin fkbp- 12 complex
Fk506 binding protein (fkbp) complex with immunosuppressant rapamycin
Human (homo sapiens) recombinant form expressed in (escherichia coli)

Cell adhesion protein
Gene v protein (single-stranded DNA binding protein)
Gene v protein. Chain: null. Engineered: yes. Biological_unit: active as a dimer
Escherichia coli. Strain: k561. Plasmid: ptt2. Gene: gen v in bacteriophage f1. Expressed in: escherichia coli.

Protein binding
Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution
Protein l. Chain: a, b, c. Fragment: b1 domain. Synonym: ig kappa light chain-binding protein. Engineered: yes. Mutation: yes
Peptrostreptococcus magnus. Bacteria. Expressed in: escherichia coli.

Metal binding protein
Bovine calbindin d9k binding mg2+
Vitamin d-dependent calcium-binding protein, intestinal. Chain: a. Synonym: calbindin d9k. Engineered: yes
Bos taurus. Bovine. Gene: synthetic gene. Expressed in: escherichia coli.

Crystal structure of iibcellobiose from escherichia coli
Enzyme iib of the cellobiose-specific phosphotransferase system. Chain: a, b. Fragment: enzyme iib. Engineered: yes. Mutation: c10s. Biological_unit: monomer
Escherichia coli. Strain: k12. Cellular_location: cytoplasm. Plasmid: pjl503. Gene: cela. Expressed in: escherichia coli.

Protein kinase inhibitor
Pkci-transition state analog
Protein kinasE C interacting protein. Chain: a, b. Synonym: pkci-1, protein kinasE C inhibitor 1, hint protein, hit protein. Engineered: yes. Biological_unit: dimer
Homo sapiens. Human. Plasmid: phil-d5. Gene: hpkci-1. Expressed in: pichia pastoris.

Fertilization protein
Ribosomal protein s6
Ribosomal protein s6. Chain: a. Engineered: yes. Mutation: yes
Thermus thermophilus. Bacteria. Expressed in: escherichia coli.

Signaling protein
Crystal structure of n-terminal domain of protein s
Development-specific protein s. Chain: a. Fragment: n-terminal domain: motifs 1-2 and linker. Synonym: spore coat protein s. Engineered: yes
Myxococcus xanthus. Bacteria. Cellular_location: cytosol. Expressed in: escherichia coli.

Histidine-containing protein (hpr), mutant with ser 46replaced by asp (s46d)
Histidine-containing protein. Chain: null. Synonym: hpr. Engineered: yes. Mutation: s46d
Escherichia coli. Strain: esk108. Expressed in: escherichia coli.

Immunoglobulin binding protein
Protein kinasE C delta cys2 domain
Protein kinasE C delta type. Domain: cys2. Heterogen: zinc
Mus musculus mouse. Expressed in: escherichia coli.

Gene regulating protein
Crystal structure of subtilisin-propeptide complex
Subtilisin e. Chain: a, b. Synonym: serine protease. Engineered: yes. Mutation: s221c
Bacillus subtilis. Strain: 168. Expressed in: escherichia coli.

Translation initiation factor 3 n-terminal domain
Translation initiation factor 3. Chain: null. Domain: n-terminal residues 1 - 78. Synonym: if3-n. Engineered: yes.
Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3).

Ribosome binding factor
Translation initiation factor 3 c-terminal domain
Translation initiation factor 3. Chain: null. Domain: c-terminal residues 79 - 172. Synonym: if3-c. Engineered: yes
Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3).

Immunoglobulin-like domain
Titin, ig repeat 27, nmr, minimized average structure
Titin, i27. Chain: null. Synonym: connectin i27, titin ig repeat 27. Engineered: yes. Other_details: solution structure, t=308k, ph 4.5, 10mm acetate buffer
Homo sapiens. Human. Organ: heart. Tissue: muscle. Organelle: sarcomere. Expressed in: escherichia coli.

Telokin-like protein
Structure of tlp20
Tlp20. Chain: null
Autographa californica nuclear polyhedrosis virus, acmnpv. Baculovirus

Chromosomal protein
Synthetic, structural and biological studies of the ubiquitin system: chemically synthesized and native ubiquitin fold into identical three-dimensional structures.
Solid-phase chemical synthesis

Crystal structure of human uracil-DNA glycosylase in complex with a protein inhibitor: protein mimicry of DNA
Uracil-DNA glycosylase. Chain: e. Synonym: udg. Engineered: yes. Mutation: p82m, v83e, g84f. Biological_unit: monomer. Uracil-DNA glycosylase inhibitor. Chain: i. Synonym: ugi.
Homo sapiens. Human. Expressed in: escherichia coli. Bacteriophage pbs2. Expressed in: escherichia coli

Complex (ribonucleoprotein/RNA)
U1a/RNA complex
U1a spliceosomal protein. Chain: a, b, c. Fragment: residues 2 - 98. Engineered: yes. Mutation: y31h, q36r. RNA 21mer hairpin (5'-(ap Ap Up Cp Cp Ap Up Up Gp Cp Ap Cp Up Cp Cp Gp Gp Ap Up Up U)-3'). Chain: p, q, r other_details: u1a is a protein from u1 small
Homo sapiens. Human. Cell_line: fetal brain cdna library. Expressed in: escherichia coli. Other_details: induction by t7 phi10 promoter. Synthetic: yes. Other_details: sequence based on hairpin ii of u1 RNA

Steroid binding
Amino terminal 9kda domain of vaccinia virus DNA topoisomerase i residues 1-77, experimental electron density for residues 1-77
DNA topoisomerase i. Chain: null. Fragment: amino terminal 9kda, residues 1 - 77. Engineered: yes. Other_details: domain generated by mild proteolysis of the intact 36kda vaccinia virus DNA topoisomerase i, a member of the eukaryotic-like type i DNA topoisomerases
Vaccinia virus. Strain: wr. Expressed in: escherichia coli (active form)

Structure of dihydrofolate reductase
Dihydrofolate reductase. Chain: null. Synonym: r67 dhfr. Ec:
Escherichia coli. Strain: tmp-resistant, containing r67 dhfr overproducing plasmid plz1

Ligand binding domain of the wild-type aspartate receptor
Aspartate receptor. Chain: null. Synonym: tar. Biological_unit: dimer
Salmonella typhimurium

Allergen phl p 2
Allergen phl p 2. Chain: null. Synonym: phl p ii. Engineered: yes
Phleum pratense. Timothy grass. Expressed in: escherichia coli

Muscle protein
Twitchin immunoglobulin superfamily domain (igsf module) (ig 18'), nmr, minimized average structure
Twitchin 18th igsf module. Chain: null. Engineered: yes
Caenorhabditis elegans. Nematode. Organ: body wall muscle. Tissue: muscle a-band. Cell: muscle cell. Expressed in: escherichia coli.

Electron transport
Acyl-phosphatase (common type) from bovine testis
Acylphosphatase. Chain: null. Synonym: acp. Biological_unit: monomer
Bos taurus. Bovine. Organ: testis. Cellular_location: cytoplasm

Signal transduction protein
Refined structure of the actin-severing domain villin 14t, determined by solution nmr, minimized average structure
Villin 14t. Chain: null. Fragment: residues 1 - 126. Synonym: villin domain 1, villin segment 1. Engineered: yes
Gallus gallus. Chicken. Organ: intestine. Cell: epithelial cells. Expressed in: escherichia coli.

Structure of bacillus pasteurii urease inhibited with acetohydroxamic acid at 1.55 a resolution
Urease (chain a). Chain: a. Synonym: urea aminohydrolase. Urease (chain b). Chain: b. Synonym: urea aminohydrolase. Urease (chain c). Chain: c. Synonym: urea aminohydrolase.
Bacillus pasteurii. Strain: dsm 33. Cellular_location: cytoplasm. Cellular_location: cytoplasm

Gene regulating protein
Refined structure of cro repressor protein from bacteriophage lambda
Cro repressor protein. Chain: o, a, b, c. Biological_unit: dimer. Other_details: water molecules and two phosphate radicals
Bacteriophage lambda

ID: 14337 · Rating: 2 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14346 - Posted: 22 Apr 2006, 6:20:22 UTC
Last modified: 22 Apr 2006, 6:33:18 UTC

We know that some of the workunits that show up on your computer have cryptic names -- all the jobs are testing some interesting scientific tricks, and
we'd like to tell you what they are! We're going to make an effort to post information in this thread about every workunit that we send out. Then you'll hear about results on our Top Prediction page and in David Baker's journal.
Here are explanations of some of the workunits that may be showing up
on your clients:

1. Testing a smart resampling strategy:


Have you ever wondered whether there's a better way to fold proteins than have 100,000 clients do completely independent runs, without talking to each other?
These workunits are testing a new strategy that may be smarter. We look at the full-atom scores and conformations from the first 10,000 models (that's the first workunit). For generating the second round of 10,000 models we then adjust the initial stages of the search (which uses a low resolution score function) to favor contacts that gave low energies in the first round.

2. The hardest protein.

Within our benchmark set, there's one protein 1tul_ with a particularly complicated fold, called the telokin-like fold. It has contacts from very distant parts of the protein chain, which arise infrequently in Rosetta (and presumably in Nature, too, but that's another story). We're trying to concentrate the resources of Rosetta@home to see whether we can get anywhere close to the right answer with massive sampling: about 10 million conformations compared to the traditional ~1000 conformations we would run in-house.


These jobs use the standard ab initio protocol. The first one uses a tenth the number of moves -- we're trying to see if we can get away with doing less per simulation, and instead running more simulations.

We're testing a strategy developed by Phil Bradley, the folding guru in our lab, where Rosetta avoids certain over-common motifs (beta hairpins). This is an effort to explore more extreme parts of the conformational space.


These runs have different "barcodes" that specify non-native protein contacts that shouldn't be made during the simulation. The question is: how much does the extra information help Rosetta find the native conformation?

3. More aggressive full atom sampling:

The final stage Rosetta's folding strategy consists of fine movements that try to fit the protein pieces togeth iner atomic detail (the "fullatom" stage, often abbreviated FA). These simulations use David Baker's latest energy terms (the "HBLR_1.0" refers to the weight on long-range hydrogen bonding) using an aggresive minimization protocol ("rotamer trials") that is made efficient with a neat graph representation within rosetta (the "trie").

4. Helical proteins from viruses


We're always trying to find new ways to couple Rosetta with experimental data. We're beginning a project to use low-resolution images made by cryo-electron microscopy to constrain Rosetta's search. Our collaborators in Wah Chiu's lab happen to be studying the proteins that form virus coats ("VP" stands for virus proteins), and these are some early jobs to begin testing the protocol. You'll see more of these in the fall, after we've been through the CASP structure prediction trials.

ID: 14346 · Rating: 1 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 14722 - Posted: 27 Apr 2006, 4:38:16 UTC
Last modified: 27 Apr 2006, 4:39:11 UTC

We're beginning to run some new jobs. As some of you know, the CASP7 blind trials are coming up, from May to August 2006. These are the
Critical Assessment of Structure Prediction experiments, held every two years. Many of the protein folding groups around the world test their methods on sequences whose experimental crystal structures have recently been solved (often with arduous effort!) but will be kept hidden by the experimenters until the fall. So this is a true blind trial.

As part of our preparation, we're running some of the targets from the previous CASP6 trials (held in 2004). Look for workunits with names like the following:


We're excited to see whether the massive computational power of Rosetta@home will give us an advantage over our methodology from 2004. We're strongly betting that it will!
ID: 14722 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 15060 - Posted: 30 Apr 2006, 2:37:14 UTC

I'm about to post some results in top predictions from our attempts to solve a very difficult protein ("1tul_") that I described before. It turns out that even after well over one million simulations we're not quite close enough -- even if we cheat and disallow non-native strand pairings! That's very good to know.

So we're trying a new idea, invented by Phil Bradley in our group, called "jumping". In this method, we assume that we know two parts of the protein chain are in contact, maybe as a guess or maybe because we have some outside source of information. These two parts of the chain are stuck together for the whole simulation. To allow some mobility in the chain between the contact points, we then put a cut in the chain. (Throughout the simulation, we try not to let this cut widen too much by penalizing the score for large chainbreaks.) This ends up being a pretty efficient way to search for low-energy protein conformations with failry complex topologies.

We'll be runnning two kinds of workunits:


The first kind uses information on 7 known strand pairings in 1tul_ -- in other words, it's a cheat. In the second kind of runs, there's a comprehensive list of all possible topologies of strand pairings (about 100,000 for this protein!), and every client chooses one topology to test per simulation. So some clients will investigate topologies that look like "sandwiches", others will try to make "barrels", and so forth. Thanks to Rosetta@home, we can sample all these topologies comprehensively for the first time!

ID: 15060 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bin Qian

Send message
Joined: 13 Jul 05
Posts: 33
Credit: 36,897
RAC: 0
Message 15302 - Posted: 2 May 2006, 15:32:46 UTC - in response to Message 15060.  
Last modified: 2 May 2006, 15:35:15 UTC

I've just sent out some work units with names starting with:


You all have been well informed by David(s) and Rhiju about the ab-initio approach they are using to fold a protein from its amino-acid sequence. I'm working on another category of the protein structure prediction problem, namely the "comparative modeling" approach. Here is how this approach works: when we want to predict the structure of a protein, we usually first do a database search on the available protein structures. Often times we can find that the protein we want to predict has a brother/sister (called a homologous protein) with its structure solved by one of the accurate but time-consuming experimental methods. It is well known that for a pair of homologous proteins, they share more or less similar shapes.

This important information can give us a "short cut" towards solving our target protein's structure because now we can jump-start our coarse-grained conformational search with the homologous protein's structure, and only search parts of the structure that are either missing or likely to be different from the homologous protein structure.

This coarse-grained search is followed by a fine-grained search in which we are trying to locate the precise positions of a protein's hundreds of thousands of atoms (called fullatom relax stage) based on its rough shape determed by the coarse-grained search. This is the same fullatom relax stage as the second step in David(s) and Rhiju's approach.

On the screen saver you will likely see the WU starts with a compact protein structure (the homologous structure), then some parts of the structure start to move around. After they settle down, the whole protein will start to wiggle with smaller scale motions.
ID: 15302 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Send message
Joined: 20 Oct 05
Posts: 6
Credit: 0
RAC: 0
Message 16119 - Posted: 13 May 2006, 0:59:37 UTC
Last modified: 13 May 2006, 1:00:02 UTC

I have just added two new workunits to the queue:

CASP_HOMOLOG_ABRELAX_hom001_t287_ and

These workunits are both for CASP, the competition Rhiju mentioned below. t278 and t283 are both sequences for which we are trying the "ab initio" approach, meaning that they will not be based on existing structures (unlike the case Bin described below). The workunits do, however, use homologs of the sequence (other proteins with similar amino acids sequences) to help in the prediction! here's how:

we take the target sequence (given by CASP) and make the best possible predictions we can. We then find sequence homologs in the database, and make the best possible predictions we can for those also. The basis of this, is that sequences that have high homology can be expected to have the same structure, so if we find a good prediction among any of the homologs, we have solved the structure for our target sequence!

These workunits are specifically crunching away at all the homologs, folding them independantly and trying to find the best structure for each. The next step will be to map our target sequence back onto all these different structures, rebuild gaps where the two sequences may be different sizes, and do some simple tweaking of the sidechains that are different. This will yield our final predictions.
ID: 16119 · Rating: 1 · rate: Rate + / Rate - Report as offensive

Send message
Joined: 15 Dec 05
Posts: 51
Credit: 69,458
RAC: 0
Message 22267 - Posted: 11 Aug 2006, 12:43:02 UTC

Structural protein
Chicken villin subdomain hp-35, n68h, ph6.7
Villin. Chain: a. Fragment: vhp. Engineered: yes. Mutation: yes
Synthetic: yes. Other_details: sequence corresponds to chicken villin residues 792-826

list of my results
ID: 22267 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile bblum

Send message
Joined: 15 Aug 06
Posts: 6
Credit: 4,077
RAC: 0
Message 22497 - Posted: 15 Aug 2006, 21:15:33 UTC
Last modified: 15 Aug 2006, 21:16:12 UTC

I've added three new jobs to the queue:

All are "abrelax" runs (ab initio prediction followed by relaxation, as described below by Bin and Rhiju--graphically speaking, big wiggles followed by small wiggles). The protein is 1di2A. We've already generated a population of decoys for 1di2A, and now we're attempting to use what we've learned from that population to generate a much better population. It's frequently clear from an initial sampling run that certain decoy features are always good--they tend, if present, to give rise to low energy. If we can identify these features, then we can concentrate the next round of sampling on them. We've generally found that features which correlate with low energy in the initial sampling round tend to be present in the native.
Features can include both low-level ones (e.g. torsion angles or rotamers for specific residues) and high-level ones (e.g. beta strand pairings). In these jobs, we're only enriching for low-level features.

The job I'm most excited about is the LARS job; here we've used a sparse linear model to identify a few features that correlated very strongly with the Rosetta energy, and I've enriched for all of them in sampling. The CHEAT job fixes a single linchpin native feature that we hope will result in sampling much closer to the native; it will allow us to test a hypothesis we have about the joint distribution of features in Rosetta sampling. The CONTROL job fixes no features and will serve as a baseline.
ID: 22497 · Rating: 1 · rate: Rate + / Rate - Report as offensive
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 22568 - Posted: 16 Aug 2006, 18:43:00 UTC - in response to Message 22497.  

I've added 58 small jobs to the queue named <protein_name>__CASP7_ABINITIO_SAVE_ALL_OUT_flatss0.33_. These jobs are designed to investigate how problems with secondary structure prediction affect our predictions. Secondary structure prediction is an initial step in our methodology that attempts to predict a local arrangements of a small piece of the protein structure, which Rosetta then arranges into an overall tertiary structure. My impression from CASP is that lack of knowledge of secondary structure makes prediction very difficult, and these workunits will give a better idea of the problem's difficulty.
ID: 22568 · Rating: 1 · rate: Rate + / Rate - Report as offensive
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 26411 - Posted: 9 Sep 2006, 0:17:07 UTC

Over the next two days I'll be adding some new jobs to the Rosetta@Home queue that will attempt to use a slightly modified methodology for ab initio prediction. The jobs look like this:


That batch of workunits (and three others) are currently running on Ralph.

We use a method called a barcode to constraint certain residues of a protein to adopt a specific conformation. This barcode ensures that the conformation stays fixed throughout an entire run. The barcode_from_frags tries to infer the residues for trying a barcode by examining the distribution of conformations for analogous sections of proteins of known structure.

And yes, I know that barcode_from_frags is in there twice and I'll try to fix that before my runs on Rosetta@Home commence. :)
ID: 26411 · Rating: 3 · rate: Rate + / Rate - Report as offensive
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 20 Sep 06
Posts: 4
Credit: 27,635
RAC: 0
Message 27652 - Posted: 20 Sep 2006, 6:06:37 UTC

We are now using Rosetta@Home to screen our HIV vaccine designs. We are remodeling the protein called GP120, which is responsible for binding with human proteins and initiate the invasion to host cells. The remodeled/redesigned GP120 can potentially be used as a vaccine. (Please see David's journal for more info)

The jobs submitted for this will have a prefix "PSH" and future jobs will also carry the name(s) GP120 and OD1 (stands for the outer domain of the GP120) for easy identification.

We are using the jobs on Rosetta@Home to refold and refine the sequences came from design runs. We first remodel the backbone using rosetta (~2000+ different backbone conformations), design them to put suitable sequences on for each of the conformations, and then score all these different sequences to find the best ones to test experimentally. The scoring process is very time consuming as each of the 2000+ starting conformations will have to be sampled thoroughly in its local conformational space before we can call definitively which is the best designed sequence. That's when Rosetta@Home comes to the rescue...

ID: 27652 · Rating: 1 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 29252 - Posted: 12 Oct 2006, 22:28:34 UTC

Over the next week or so, David Baker and I will be testing out a new protocol for full atom refinement that should more efficiently hit lower energies! The workunits in this new study will have names like:


ID: 29252 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 29262 - Posted: 13 Oct 2006, 1:42:36 UTC

All the docking WUs will have names like "DOC_" and there will be two proteins colored in blue and red shown on the screensaver. Compared to "RELAX" WUs, docking WUs normally have smaller number of steps and each step takes longer time. Also, since rmsd is calculated different in docking, they may have rather larger number values.
ID: 29262 · Rating: 0 · rate: Rate + / Rate - Report as offensive
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 30090 - Posted: 27 Oct 2006, 9:07:44 UTC

Starting tomorrow we will be sending out our first workunits for our newest ab-initio structure prediction project. These workunits will look like this:


We are collaborating with a number of structural genomics centers so that we will attempt to predict structures whose structures will be solved within the next six months. This project is very obviously inspired by the CASP competitions, and some of us in the lab have started calling this "CASP all-the-time." It will be very useful for us to have a running benchmark of our methods on absolutely new crystal structures.
ID: 30090 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Brian Kidd

Send message
Joined: 9 Dec 06
Posts: 5
Credit: 327
RAC: 0
Message 32315 - Posted: 9 Dec 2006, 9:33:06 UTC

I recently submitted a job (BAK_1avs_TnC_loop_model) to predict the structural changes in an allosteric protein called Troponin C. This protein regulates muscle contraction and has important health consequences for cardiovascular disease.

In the near future, I will be submitting more workunits related to predicting conformational changes in allosteric proteins. For each submission, I'll describe a little about the protein's function and it's relevance to medicine or technology. As for allosteric proteins, they are particularly interesting cases for structure prediction because their function is regulated by structural changes that take place far from the active site. As a consequence these proteins have multiple functional states and we are attempting to predict the structures associated with these states. Our basic approach is to exhaustively sample the conformational away from the starting structure and look for additional energy minima. Thanks for your help in this important problem.
ID: 32315 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Brian Kidd

Send message
Joined: 9 Dec 06
Posts: 5
Credit: 327
RAC: 0
Message 32317 - Posted: 9 Dec 2006, 10:54:50 UTC
Last modified: 9 Dec 2006, 10:56:47 UTC

A new job was just submitted - BAK_1klf_FimH_loop_model.

This protein is a bacterial adhesion protein called FimH. The protein resides on the tip of projections that pathogenic e. coli extrude from their cell wall to stick to glycoproteins on the surface of human cells. The neat feature of this protein is that its binding is regulated by force, and the bond lifetime actually increases as force increases, called a "catch bond". This catch-like behavior is counter intuitive and contrary to how most bonds function - most bonds decrease their lifetime if you yank on them with more force. Predicting the structures of both states, active and inactive, would have huge implications for treating bacterial infections and developing novel molecular force sensors.

FimH is an allosteric protein and the structure of the unbound - "inactive" - conformation has been technically difficult to solve by experimental methods. However, we ought to be able to predict the inactive structure with the loop modeling and full-atom relax techniques that we've been adapting to look at allosteric proteins. In addition, some recent work by our collaborators has given us reason to believe that there will be a NMR structure in the not too distant future. This experimental data will be a great validation for the predicted structures that you all are solving with Rosetta@home. Thanks so much for your support in predicting this structure.
ID: 32317 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 5 Sep 06
Posts: 423
Credit: 6
RAC: 0
Message 38063 - Posted: 20 Mar 2007, 21:31:46 UTC
Last modified: 20 Mar 2007, 21:38:57 UTC

More details from Chu on Protein-protein docking and CAPRI.

More details from Rhiju on RNA folding
Rosetta Informational Moderator: Mod.Zilla
ID: 38063 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 38164 - Posted: 23 Mar 2007, 5:43:40 UTC
Last modified: 23 Mar 2007, 5:44:27 UTC

The work units labeled RNA_ABINITIO model folding of small chains of RNA, a molecule similar to DNA. They start from extended chains and then try to maximize the pairing and stacking of the RNA sidechains ("bases").

Info on where the sequences come from:

E-loop from ribosome 5S RNA

HIV psiRNA dimerization signal

tRNA anticodon stem loop

1l2x and 2a43
Viral frameshifting pseudoknots

Nucleolin binding hairpin

Sarcin/ricin loop from the large ribosomal subunit

GNRA tetraloop

Domain I from the self-splicing Group II intron

Rigorously conserved RNA element from SARS

tRNA(phe) from yeast

The P4-P6 domain from the self-splicing Tetrahymena group I ribozyme

ID: 38164 · Rating: 0 · rate: Rate + / Rate - Report as offensive
1 · 2 · Next

Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log

©2024 University of Washington