Rosetta@home

Rosetta@home Active WorkUnit(s) Log

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log

Sort
AuthorMessage
Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 14309 - Posted 21 Apr 2006 22:15:57 UTC
Last modified: 21 Apr 2006 22:18:09 UTC

This thread is set aside as a place for the project to keep a public log of the Work Units being processed, and to provide some information about each type.

Posts on other topics or discussion post will be moved to other locations.

For information related to the current Rosetta@home application release and a version history please see this thread.

____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 14335 - Posted 22 Apr 2006 2:43:22 UTC
Last modified: 22 Apr 2006 2:45:26 UTC

I'm going to start this log with a pretty complete list of
proteins that we're looking at! First, there is a set
of eleven "small" proteins (<80 residues) that David Baker
and Divya have focused on for calibrating the Rosetta
energy function.

The information on all the proteins is extracted
from the very useful PDBsum database at the European
Bioinformatics Institute. You can access that
database directly by clicking on each protein's code!


1b72
Protein/DNA
Pbx1, homeobox protein hox-b1/DNA ternary complex
Homeobox protein hox-b1.
Homo sapiens. Human. Gene: hoxb-1.

1dcj
Structural genomics, unknown function
Solution structure of yhhp, a novel escherichia coli protein implicated in the cell division
Yhhp protein.
Escherichia coli. Bacteria.

1di2
RNA binding protein/RNA
Crystal structure of a dsrna-binding domain complexed with dsrna: molecular basis of double-stranded RNA-protein interactions
Double stranded RNA binding protein a.
Xenopus laevis.

1dtj
Immune system
Crystal structure of nova-2 kh3 k-homology RNA-binding domain
RNA-binding neurooncological ventral antigen 2.
Homo sapiens. Human. Organ: brain. Cell: neuron.

1hz6
Protein binding
Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution
Protein l.
Peptrostreptococcus magnus. Bacteria.

1mky
Ligand binding protein
Structural analysis of the domain interactions in der, a switch protein containing two gtpase domains
Probable gtp-binding protein enga.
Thermotoga maritima. Archebacteria.

1n0u
Translation
Crystal structure of yeast elongation factor 2 in complex with sordarin
Elongation factor 2.
Saccharomyces cerevisiae. Yeast

1ogw
Chromosomal protein
Synthetic ubiquitin with fluoro-leu at 50 and 67
Ubiquitin.
Homo sapiens. Human

1r69
Gene regulating protein
434 repressor (amino-terminal domain) (r1-69)
Phage 434

1tif
Ribosome binding factor
Translation initiation factor 3 n-terminal domain
Translation initiation factor 3.
Bacillus stearothermophilus.

2reb
DNA binding protein
The structure of the e. Coli reca protein monomer and polymer
Reca protein
(Escherichia coli)

____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 14337 - Posted 22 Apr 2006 2:52:23 UTC
Last modified: 22 Apr 2006 6:00:07 UTC

Around January 2006, David Kim and I began testing the Rosetta@home
strategy on a diverse set of 62 proteins, whose crystal structures
are well known. These protein sequences have been extensively
used in Baker lab benchmarks for several years -- it has been
gratifying to see Rosetta@home working on many of these sequences
that we found challenging or even impossible with out in-house
resources! See our Top Prediction page and David Baker's journal for examples.

One more note, we've also been testing, on a grand scale,
whether we can use information from homologs for each of these
sequences. For example, in the 1fna protein fibronectin, our main
sequence listed below comes from the famous bacterium E. coli.
But there's a closely related sequence in chickens (amazing, huh?)
that happens to fold better (i.e., to lower energies) in Rosetta.
That's the one we've posted in the top predictions.

We've been looking at a few other proteins not in these two sets, and
we'll describe them -- along with their workunits -- above.

1a19
Ribonuclease inhibitor
Barstar (free), c82a mutant
Barstar. Chain: a, b. Engineered: yes. Mutation: c82a. Biological_unit: monomer
Bacillus amyloliquefaciens. Expressed in: escherichia coli

1a32
Ribosomal protein
Ribosomal protein s15 from bacillus stearothermophilus
Ribosomal protein s15. Chain: null. Engineered: yes
Bacillus stearothermophilus. Cellular_location: ribosome. Expressed in: escherichia coli.

1a68
Potassium channels
Crystal structure of the tetramerization domain of the shaker potassium channel
Potassium channel kv1.1. Chain: null. Fragment: tetramerization domain. Engineered: yes. Mutation: inserted met at n-terminus. Biological_unit: tetramer
Aplysia californica. California sea hare. Tissue: central nervous system. Cellular_location: cytoplasm. Expressed in: escherichia coli.

1acf
Contractile protein
Acanthamoeba castellanii profilin ib
Profilin i
(Acanthamoeba castellanii)

1ail
RNA-binding protein
N-terminal fragment of ns1 protein from influenza a virus
Nonstructural protein ns1. Chain: null. Fragment: RNA-binding domain
Influenza a virus

1aiu
Electron transport
Human thioredoxin (d60n mutant, reduced form)
Thioredoxin. Chain: null. Engineered: yes. Mutation: d60n. Other_details: active site cysteines 32 and 35 in the reduced form
Homo sapiens. Human. Expressed in: escherichia coli.

1b3a
Anti-HIV protein
Total chemical synthesis and high-resolution crystal structure of the potent anti-HIV protein aop-rantes
Rantes. Chain: a, b. Engineered: yes. Other_details: oxime link between aop group and pro3
Synthetic: yes

1bgf
Transcription factor
Stat-4 n-domain
Stat-4. Chain: null. Fragment: n-terminal domain. Engineered: yes. Biological_unit: dimer
Mus musculus. Mouse. Expressed in: escherichia coli.

1bk2
Sh3-domain
A-spectrin sh3 domain d48g mutant
A-spectrin. Chain: null. Fragment: sh3-domain. Engineered: yes. Mutation: d48g
Gallus gallus. Chicken. Plasmid: pbat4. Expressed in: escherichia coli.

1bkr
Actin-binding
Calponin homology (ch) domain from human beta-spectrin at 1.1 angstrom resolution
Spectrin beta chain. Chain: a. Fragment: f-actin binding domain residues 173 - 281. Synonym: calponin homology (ch) domain. Engineered: yes. Biological_unit: heterotetramer
Homo sapiens. Human. Cell: non-erythrocyte. Cellular_location: cytoskeletal protein. Expressed in: escherichia coli.

1bm8
Cell cycle
DNA-binding domain of mbp1
Transcription factor mbp1. Chain: null. Fragment: n-terminal DNA-binding domain. Synonym: mcb (mlui cell-cyle box) binding protein. Engineered: yes
Saccharomyces cerevisiae. Baker's yeast. Plasmid: pet43 (c2681). Expressed in: escherichia coli.

1bq9
Metal binding protein
Rubredoxin (formyl methionine mutant) from pyrococcus furiosus
Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes
Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene

1bq9
Metal binding protein
Rubredoxin (formyl methionine mutant) from pyrococcus furiosus
Rubredoxin. Chain: a. Synonym: pf rd. Engineered: yes. Mutation: yes
Pyrococcus furiosus. Variant: fmet. Expressed in: e. Coli. Other_details: product of a synthetic pf rd gene

1c8c
DNA binding protein/DNA
Crystal structures of the chromosomal proteins sso7d/sac7d bound to DNA containing t-g mismatched base pairs
DNA-binding protein 7a. Chain: a. Synonym: sso7d, 7 kda DNA-binding protein d. 5'-d( Gp Tp Gp Ap Tp Cp Gp C)-3'. Chain: b, c. Engineered: yes
Sulfolobus solfataricus. Collection: dsm 1617. Other_details: german collection of microorganisms (dsm) 1617, #1616. Synthetic: yes

1c9o
Transcription
Crystal structure analysis of the bacillus caldolyticus cold shock protein bc-csp
Cold-shock protein. Chain: a, b. Synonym: cspb
Bacillus caldolyticus

1cc8
Metal transport
Crystal structure of the atx1 metallochaperone protein
Metallochaperone atx1. Chain: a. Engineered: yes
Saccharomyces cerevisiae. Baker's yeast. Expressed in: escherichia coli.

1cei
Antibacterial protein
Structure determination of the colicin e7 immunity protein (imme7) that binds specifically to the dnase-type colicin e7 and inhibits its bacteriocidal activity
Colicin e7 immunity protein. Chain: null. Synonym: imme7
Escherichia coli

1cg5
Oxygen transport
Deoxy form hemoglobin from dasyatis akajei
Hemoglobin. Chain: a. Hemoglobin. Chain: b
Dasyatis akajei. Akaei. Cell: erythrocyte. Cell: erythrocyte

1ctf
Ribosomal protein
1.65 angstrom resolution structure of 7,8-dihydroneopterin aldolase from staphylococcus aureus
7,8-dihydroneopterin aldolase. Chain: null. Synonym: dhna. Engineered: yes. Other_details: octamer is crystallographic
Staphylococcus aureus. Collection: atcc 25923. Gene: dhna. Expressed in: escherichia coli.

1e6i
Gene regulation
Bromodomain from gcn5 complexed with acetylated h4 peptide
Transcriptional activator gcn5. Chain: a. Fragment: bromodomain. Engineered: yes. H4 peptide. Chain: p. Fragment: acetylated tail. Other_details: lysine 16 is acetylated on nz
Saccharomyces cerevisiae. Gene: gcn5. Expressed in: escherichia coli. Synthetic: corresponds to residues 15-29 of histone h4.

1elw
Chaperone
Crystal structure of the tpr1 domain of hop in complex with a hsc70 peptide
Tpr1-domain of hop. Chain: a, b. Fragment: n-terminal domain. Engineered: yes. Hsc70-peptide. Chain: c, d. Engineered: yes
Homo sapiens. Human. Expressed in: escherichia coli. Synthetic: yes. Other_details: this sequence occurs naturally in humans

1enh
DNA-binding protein
Crystal structure of escherichia coli cyay protein reveals a novel fold for the frataxin family
Cyay protein. Chain: a. Engineered: yes
Escherichia coli. Bacteria. Expressed in: escherichia coli.

1eyv
Transcription
The crystal structure of nusb from mycobacterium tuberculosis
N-utilizing substance protein b homolog. Chain: a, b. Synonym: nusb protein. Engineered: yes
Mycobacterium tuberculosis. Bacteria. Expressed in: escherichia coli.

1fkb
Isomerase
Atomic structure of the rapamycin human immunophilin fkbp- 12 complex
Fk506 binding protein (fkbp) complex with immunosuppressant rapamycin
Human (homo sapiens) recombinant form expressed in (escherichia coli)

1fna
Cell adhesion protein
Gene v protein (single-stranded DNA binding protein)
Gene v protein. Chain: null. Engineered: yes. Biological_unit: active as a dimer
Escherichia coli. Strain: k561. Plasmid: ptt2. Gene: gen v in bacteriophage f1. Expressed in: escherichia coli.

1hz6
Protein binding
Crystal structures of the b1 domain of protein l from peptostreptococcus magnus with a tyrosine to tryptophan substitution
Protein l. Chain: a, b, c. Fragment: b1 domain. Synonym: ig kappa light chain-binding protein. Engineered: yes. Mutation: yes
Peptrostreptococcus magnus. Bacteria. Expressed in: escherichia coli.

1ig5
Metal binding protein
Bovine calbindin d9k binding mg2+
Vitamin d-dependent calcium-binding protein, intestinal. Chain: a. Synonym: calbindin d9k. Engineered: yes
Bos taurus. Bovine. Gene: synthetic gene. Expressed in: escherichia coli.

1iib
Phosphotransferase
Crystal structure of iibcellobiose from escherichia coli
Enzyme iib of the cellobiose-specific phosphotransferase system. Chain: a, b. Fragment: enzyme iib. Engineered: yes. Mutation: c10s. Biological_unit: monomer
Escherichia coli. Strain: k12. Cellular_location: cytoplasm. Plasmid: pjl503. Gene: cela. Expressed in: escherichia coli.

1kpe
Protein kinase inhibitor
Pkci-transition state analog
Protein kinasE C interacting protein. Chain: a, b. Synonym: pkci-1, protein kinasE C inhibitor 1, hint protein, hit protein. Engineered: yes. Biological_unit: dimer
Homo sapiens. Human. Plasmid: phil-d5. Gene: hpkci-1. Expressed in: pichia pastoris.

1lis
Fertilization protein
Ribosomal protein s6
Ribosomal protein s6. Chain: a. Engineered: yes. Mutation: yes
Thermus thermophilus. Bacteria. Expressed in: escherichia coli.

1nps
Signaling protein
Crystal structure of n-terminal domain of protein s
Development-specific protein s. Chain: a. Fragment: n-terminal domain: motifs 1-2 and linker. Synonym: spore coat protein s. Engineered: yes
Myxococcus xanthus. Bacteria. Cellular_location: cytosol. Expressed in: escherichia coli.

1opd
Phosphotransferase
Histidine-containing protein (hpr), mutant with ser 46replaced by asp (s46d)
Histidine-containing protein. Chain: null. Synonym: hpr. Engineered: yes. Mutation: s46d
Escherichia coli. Strain: esk108. Expressed in: escherichia coli.

1pgx
Immunoglobulin binding protein
Protein kinasE C delta cys2 domain
Protein kinasE C delta type. Domain: cys2. Heterogen: zinc
Mus musculus mouse. Expressed in: escherichia coli.

1r69
Gene regulating protein
Crystal structure of subtilisin-propeptide complex
Subtilisin e. Chain: a, b. Synonym: serine protease. Engineered: yes. Mutation: s221c
Bacillus subtilis. Strain: 168. Expressed in: escherichia coli.

1shf
Phosphotransferase
Translation initiation factor 3 n-terminal domain
Translation initiation factor 3. Chain: null. Domain: n-terminal residues 1 - 78. Synonym: if3-n. Engineered: yes.
Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3).

1tig
Ribosome binding factor
Translation initiation factor 3 c-terminal domain
Translation initiation factor 3. Chain: null. Domain: c-terminal residues 79 - 172. Synonym: if3-c. Engineered: yes
Bacillus stearothermophilus. Expressed in: escherichia coli bl21(de3).

1tit
Immunoglobulin-like domain
Titin, ig repeat 27, nmr, minimized average structure
Titin, i27. Chain: null. Synonym: connectin i27, titin ig repeat 27. Engineered: yes. Other_details: solution structure, t=308k, ph 4.5, 10mm acetate buffer
Homo sapiens. Human. Organ: heart. Tissue: muscle. Organelle: sarcomere. Expressed in: escherichia coli.

1tul
Telokin-like protein
Structure of tlp20
Tlp20. Chain: null
Autographa californica nuclear polyhedrosis virus, acmnpv. Baculovirus

1ubi
Chromosomal protein
Synthetic, structural and biological studies of the ubiquitin system: chemically synthesized and native ubiquitin fold into identical three-dimensional structures.
Ubiquitin
Solid-phase chemical synthesis

1ugh
Glycosylase
Crystal structure of human uracil-DNA glycosylase in complex with a protein inhibitor: protein mimicry of DNA
Uracil-DNA glycosylase. Chain: e. Synonym: udg. Engineered: yes. Mutation: p82m, v83e, g84f. Biological_unit: monomer. Uracil-DNA glycosylase inhibitor. Chain: i. Synonym: ugi.
Homo sapiens. Human. Expressed in: escherichia coli. Bacteriophage pbs2. Expressed in: escherichia coli

1urn
Complex (ribonucleoprotein/RNA)
U1a/RNA complex
U1a spliceosomal protein. Chain: a, b, c. Fragment: residues 2 - 98. Engineered: yes. Mutation: y31h, q36r. RNA 21mer hairpin (5'-(ap Ap Up Cp Cp Ap Up Up Gp Cp Ap Cp Up Cp Cp Gp Gp Ap Up Up U)-3'). Chain: p, q, r other_details: u1a is a protein from u1 small
Homo sapiens. Human. Cell_line: fetal brain cdna library. Expressed in: escherichia coli. Other_details: induction by t7 phi10 promoter. Synthetic: yes. Other_details: sequence based on hairpin ii of u1 RNA

1utg
Steroid binding
Amino terminal 9kda domain of vaccinia virus DNA topoisomerase i residues 1-77, experimental electron density for residues 1-77
DNA topoisomerase i. Chain: null. Fragment: amino terminal 9kda, residues 1 - 77. Engineered: yes. Other_details: domain generated by mild proteolysis of the intact 36kda vaccinia virus DNA topoisomerase i, a member of the eukaryotic-like type i DNA topoisomerases
Vaccinia virus. Strain: wr. Expressed in: escherichia coli (active form)

1vie
Oxidoreductase
Structure of dihydrofolate reductase
Dihydrofolate reductase. Chain: null. Synonym: r67 dhfr. Ec: 1.5.1.3
Escherichia coli. Strain: tmp-resistant, containing r67 dhfr overproducing plasmid plz1

1vls
Chemotaxis
Ligand binding domain of the wild-type aspartate receptor
Aspartate receptor. Chain: null. Synonym: tar. Biological_unit: dimer
Salmonella typhimurium

1who
Allergen
Allergen phl p 2
Allergen phl p 2. Chain: null. Synonym: phl p ii. Engineered: yes
Phleum pratense. Timothy grass. Expressed in: escherichia coli

1wit
Muscle protein
Twitchin immunoglobulin superfamily domain (igsf module) (ig 18'), nmr, minimized average structure
Twitchin 18th igsf module. Chain: null. Engineered: yes
Caenorhabditis elegans. Nematode. Organ: body wall muscle. Tissue: muscle a-band. Cell: muscle cell. Expressed in: escherichia coli.

256b
Electron transport
Acyl-phosphatase (common type) from bovine testis
Acylphosphatase. Chain: null. Synonym: acp. Biological_unit: monomer
Bos taurus. Bovine. Organ: testis. Cellular_location: cytoplasm

2chf
Signal transduction protein
Refined structure of the actin-severing domain villin 14t, determined by solution nmr, minimized average structure
Villin 14t. Chain: null. Fragment: residues 1 - 126. Synonym: villin domain 1, villin segment 1. Engineered: yes
Gallus gallus. Chicken. Organ: intestine. Cell: epithelial cells. Expressed in: escherichia coli.

4ubp
Hydrolase
Structure of bacillus pasteurii urease inhibited with acetohydroxamic acid at 1.55 a resolution
Urease (chain a). Chain: a. Synonym: urea aminohydrolase. Urease (chain b). Chain: b. Synonym: urea aminohydrolase. Urease (chain c). Chain: c. Synonym: urea aminohydrolase.
Bacillus pasteurii. Strain: dsm 33. Cellular_location: cytoplasm. Cellular_location: cytoplasm

5cro
Gene regulating protein
Refined structure of cro repressor protein from bacteriophage lambda
Cro repressor protein. Chain: o, a, b, c. Biological_unit: dimer. Other_details: water molecules and two phosphate radicals
Bacteriophage lambda

____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 14346 - Posted 22 Apr 2006 6:20:22 UTC
Last modified: 22 Apr 2006 6:33:18 UTC

We know that some of the workunits that show up on your computer have cryptic names -- all the jobs are testing some interesting scientific tricks, and
we'd like to tell you what they are! We're going to make an effort to post information in this thread about every workunit that we send out. Then you'll hear about results on our Top Prediction page and in David Baker's journal.
Here are explanations of some of the workunits that may be showing up
on your clients:

1. Testing a smart resampling strategy:

FARELAX_NOFILTERS_xxxx
FACONTACTS_RECENTER_NOFILTERS_xxxx

Have you ever wondered whether there's a better way to fold proteins than have 100,000 clients do completely independent runs, without talking to each other?
These workunits are testing a new strategy that may be smarter. We look at the full-atom scores and conformations from the first 10,000 models (that's the first workunit). For generating the second round of 10,000 models we then adjust the initial stages of the search (which uses a low resolution score function) to favor contacts that gave low energies in the first round.

2. The hardest protein.

Within our benchmark set, there's one protein 1tul_ with a particularly complicated fold, called the telokin-like fold. It has contacts from very distant parts of the protein chain, which arise infrequently in Rosetta (and presumably in Nature, too, but that's another story). We're trying to concentrate the resources of Rosetta@home to see whether we can get anywhere close to the right answer with massive sampling: about 10 million conformations compared to the traditional ~1000 conformations we would run in-house.

PROD_ABINITIO_FAST_1tul_
PROD_ABINITIO_1tul_

These jobs use the standard ab initio protocol. The first one uses a tenth the number of moves -- we're trying to see if we can get away with doing less per simulation, and instead running more simulations.

PROD_ABINITIO_ALPHABETABAR_1tul_
We're testing a strategy developed by Phil Bradley, the folding guru in our lab, where Rosetta avoids certain over-common motifs (beta hairpins). This is an effort to explore more extreme parts of the conformational space.

PROD_ABINITIO_9FULLSTRANDBAR_1tul_
PROD_ABINITIO_9STRANDBAR_1tul_
PROD_ABINITIO_7STRANDBAR_1tul_

These runs have different "barcodes" that specify non-native protein contacts that shouldn't be made during the simulation. The question is: how much does the extra information help Rosetta find the native conformation?

3. More aggressive full atom sampling:

HBLR_1.0_xxxx_ROT_TRIALS_TRIE
The final stage Rosetta's folding strategy consists of fine movements that try to fit the protein pieces togeth iner atomic detail (the "fullatom" stage, often abbreviated FA). These simulations use David Baker's latest energy terms (the "HBLR_1.0" refers to the weight on long-range hydrogen bonding) using an aggresive minimization protocol ("rotamer trials") that is made efficient with a neat graph representation within rosetta (the "trie").

4. Helical proteins from viruses

VP_TEST_core_vp26_
VP_TEST_truncate_termini_vp26_
VP_TEST_vp26_
VP_TEST_truncate_termini_1qgtA
VP_TEST_1qgtA

We're always trying to find new ways to couple Rosetta with experimental data. We're beginning a project to use low-resolution images made by cryo-electron microscopy to constrain Rosetta's search. Our collaborators in Wah Chiu's lab happen to be studying the proteins that form virus coats ("VP" stands for virus proteins), and these are some early jobs to begin testing the protocol. You'll see more of these in the fall, after we've been through the CASP structure prediction trials.






____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 14722 - Posted 27 Apr 2006 4:38:16 UTC
Last modified: 27 Apr 2006 4:39:11 UTC

We're beginning to run some new jobs. As some of you know, the CASP7 blind trials are coming up, from May to August 2006. These are the
Critical Assessment of Structure Prediction experiments, held every two years. Many of the protein folding groups around the world test their methods on sequences whose experimental crystal structures have recently been solved (often with arduous effort!) but will be kept hidden by the experimenters until the fall. So this is a true blind trial.

As part of our preparation, we're running some of the targets from the previous CASP6 trials (held in 2004). Look for workunits with names like the following:

AB_CASP6_t242_
AB_CASP6_t272_
...

We're excited to see whether the massive computational power of Rosetta@home will give us an advantage over our methodology from 2004. We're strongly betting that it will!
____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 15060 - Posted 30 Apr 2006 2:37:14 UTC

I'm about to post some results in top predictions from our attempts to solve a very difficult protein ("1tul_") that I described before. It turns out that even after well over one million simulations we're not quite close enough -- even if we cheat and disallow non-native strand pairings! That's very good to know.

So we're trying a new idea, invented by Phil Bradley in our group, called "jumping". In this method, we assume that we know two parts of the protein chain are in contact, maybe as a guess or maybe because we have some outside source of information. These two parts of the chain are stuck together for the whole simulation. To allow some mobility in the chain between the contact points, we then put a cut in the chain. (Throughout the simulation, we try not to let this cut widen too much by penalizing the score for large chainbreaks.) This ends up being a pretty efficient way to search for low-energy protein conformations with failry complex topologies.

We'll be runnning two kinds of workunits:

JUMPTEST_1tul_
JUMP_ALLBARCODEXX_1tul_

The first kind uses information on 7 known strand pairings in 1tul_ -- in other words, it's a cheat. In the second kind of runs, there's a comprehensive list of all possible topologies of strand pairings (about 100,000 for this protein!), and every client chooses one topology to test per simulation. So some clients will investigate topologies that look like "sandwiches", others will try to make "barrels", and so forth. Thanks to Rosetta@home, we can sample all these topologies comprehensively for the first time!


____________

Bin Qian
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 13 05
Posts: 33
ID: 18
Credit: 36,897
RAC: 0
Message 15302 - Posted 2 May 2006 15:32:46 UTC - in response to Message ID 15060.
Last modified: 2 May 2006 15:35:15 UTC

I've just sent out some work units with names starting with:

HOMO_xxxx_h0xx_1_LOOPRLX_

You all have been well informed by David(s) and Rhiju about the ab-initio approach they are using to fold a protein from its amino-acid sequence. I'm working on another category of the protein structure prediction problem, namely the "comparative modeling" approach. Here is how this approach works: when we want to predict the structure of a protein, we usually first do a database search on the available protein structures. Often times we can find that the protein we want to predict has a brother/sister (called a homologous protein) with its structure solved by one of the accurate but time-consuming experimental methods. It is well known that for a pair of homologous proteins, they share more or less similar shapes.

This important information can give us a "short cut" towards solving our target protein's structure because now we can jump-start our coarse-grained conformational search with the homologous protein's structure, and only search parts of the structure that are either missing or likely to be different from the homologous protein structure.

This coarse-grained search is followed by a fine-grained search in which we are trying to locate the precise positions of a protein's hundreds of thousands of atoms (called fullatom relax stage) based on its rough shape determed by the coarse-grained search. This is the same fullatom relax stage as the second step in David(s) and Rhiju's approach.

On the screen saver you will likely see the WU starts with a compact protein structure (the homologous structure), then some parts of the structure start to move around. After they settle down, the whole protein will start to wiggle with smaller scale motions.
____________

divyab
Forum moderator
Project administrator
Project scientist

Joined: Oct 20 05
Posts: 6
ID: 5753
Credit: 0
RAC: 0
Message 16119 - Posted 13 May 2006 0:59:37 UTC
Last modified: 13 May 2006 1:00:02 UTC

I have just added two new workunits to the queue:

CASP_HOMOLOG_ABRELAX_hom001_t287_ and
HOMOLOG_ABRELAX_hom0xx_t283_

These workunits are both for CASP, the competition Rhiju mentioned below. t278 and t283 are both sequences for which we are trying the "ab initio" approach, meaning that they will not be based on existing structures (unlike the case Bin described below). The workunits do, however, use homologs of the sequence (other proteins with similar amino acids sequences) to help in the prediction! here's how:

we take the target sequence (given by CASP) and make the best possible predictions we can. We then find sequence homologs in the database, and make the best possible predictions we can for those also. The basis of this, is that sequences that have high homology can be expected to have the same structure, so if we find a good prediction among any of the homologs, we have solved the structure for our target sequence!

These workunits are specifically crunching away at all the homologs, folding them independantly and trying to find the best structure for each. The next step will be to map our target sequence back onto all these different structures, rebuild gaps where the two sequences may be different sizes, and do some simple tweaking of the sidechains that are different. This will yield our final predictions.
____________

mnb

Joined: Dec 15 05
Posts: 51
ID: 38106
Credit: 56,430
RAC: 0
Message 22267 - Posted 11 Aug 2006 12:43:02 UTC

1yrf
Structural protein
Chicken villin subdomain hp-35, n68h, ph6.7
Villin. Chain: a. Fragment: vhp. Engineered: yes. Mutation: yes
Synthetic: yes. Other_details: sequence corresponds to chicken villin residues 792-826

____________
list of my results

bblum Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Aug 15 06
Posts: 6
ID: 104992
Credit: 4,077
RAC: 0
Message 22497 - Posted 15 Aug 2006 21:15:33 UTC
Last modified: 15 Aug 2006 21:16:12 UTC

I've added three new jobs to the queue:
1di2__LARS_ABRELAX_SAVE_ALL_OUT_BARCODE_
1di2__CHEAT_ABRELAX_SAVE_ALL_OUT_BARCODE_
1di2__CONTROL_ABRELAX_SAVE_ALL_OUT_

All are "abrelax" runs (ab initio prediction followed by relaxation, as described below by Bin and Rhiju--graphically speaking, big wiggles followed by small wiggles). The protein is 1di2A. We've already generated a population of decoys for 1di2A, and now we're attempting to use what we've learned from that population to generate a much better population. It's frequently clear from an initial sampling run that certain decoy features are always good--they tend, if present, to give rise to low energy. If we can identify these features, then we can concentrate the next round of sampling on them. We've generally found that features which correlate with low energy in the initial sampling round tend to be present in the native.
Features can include both low-level ones (e.g. torsion angles or rotamers for specific residues) and high-level ones (e.g. beta strand pairings). In these jobs, we're only enriching for low-level features.

The job I'm most excited about is the LARS job; here we've used a sparse linear model to identify a few features that correlated very strongly with the Rosetta energy, and I've enriched for all of them in sampling. The CHEAT job fixes a single linchpin native feature that we hope will result in sampling much closer to the native; it will allow us to test a hypothesis we have about the joint distribution of features in Rosetta sampling. The CONTROL job fixes no features and will serve as a baseline.
____________

James Thompson
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Oct 13 05
Posts: 41
ID: 4392
Credit: 178,353
RAC: 97
Message 22568 - Posted 16 Aug 2006 18:43:00 UTC - in response to Message ID 22497.

I've added 58 small jobs to the queue named <protein_name>__CASP7_ABINITIO_SAVE_ALL_OUT_flatss0.33_. These jobs are designed to investigate how problems with secondary structure prediction affect our predictions. Secondary structure prediction is an initial step in our methodology that attempts to predict a local arrangements of a small piece of the protein structure, which Rosetta then arranges into an overall tertiary structure. My impression from CASP is that lack of knowledge of secondary structure makes prediction very difficult, and these workunits will give a better idea of the problem's difficulty.
____________

James Thompson
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Oct 13 05
Posts: 41
ID: 4392
Credit: 178,353
RAC: 97
Message 26411 - Posted 9 Sep 2006 0:17:07 UTC

Over the next two days I'll be adding some new jobs to the Rosetta@Home queue that will attempt to use a slightly modified methodology for ab initio prediction. The jobs look like this:

2vik__BARCODE_SEARCH_BARCODE_FROM_FRAGS_ABINITIO_barcode_from_frags

That batch of workunits (and three others) are currently running on Ralph.

We use a method called a barcode to constraint certain residues of a protein to adopt a specific conformation. This barcode ensures that the conformation stays fixed throughout an entire run. The barcode_from_frags tries to infer the residues for trying a barcode by examining the distribution of conformations for analogous sections of proteins of known structure.

And yes, I know that barcode_from_frags is in there twice and I'll try to fix that before my runs on Rosetta@Home commence. :)
____________

Possu
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Sep 20 06
Posts: 3
ID: 113637
Credit: 27,635
RAC: 0
Message 27652 - Posted 20 Sep 2006 6:06:37 UTC

We are now using Rosetta@Home to screen our HIV vaccine designs. We are remodeling the protein called GP120, which is responsible for binding with human proteins and initiate the invasion to host cells. The remodeled/redesigned GP120 can potentially be used as a vaccine. (Please see David's journal for more info)

The jobs submitted for this will have a prefix "PSH" and future jobs will also carry the name(s) GP120 and OD1 (stands for the outer domain of the GP120) for easy identification.

We are using the jobs on Rosetta@Home to refold and refine the sequences came from design runs. We first remodel the backbone using rosetta (~2000+ different backbone conformations), design them to put suitable sequences on for each of the conformations, and then score all these different sequences to find the best ones to test experimentally. The scoring process is very time consuming as each of the 2000+ starting conformations will have to be sampled thoroughly in its local conformational space before we can call definitively which is the best designed sequence. That's when Rosetta@Home comes to the rescue...


____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 29252 - Posted 12 Oct 2006 22:28:34 UTC

Over the next week or so, David Baker and I will be testing out a new protocol for full atom refinement that should more efficiently hit lower energies! The workunits in this new study will have names like:

2tif__BOINC_NEWRELAXFLAGS_ABRELAX_SAVE_ALL_OUT_
2tif__BOINC_OLDRELAXFLAGS_ABRELAX_SAVE_ALL_OUT_

____________

Chu
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Feb 23 06
Posts: 120
ID: 61076
Credit: 112,439
RAC: 6
Message 29262 - Posted 13 Oct 2006 1:42:36 UTC

All the docking WUs will have names like "DOC_" and there will be two proteins colored in blue and red shown on the screensaver. Compared to "RELAX" WUs, docking WUs normally have smaller number of steps and each step takes longer time. Also, since rmsd is calculated different in docking, they may have rather larger number values.
____________

James Thompson
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Oct 13 05
Posts: 41
ID: 4392
Credit: 178,353
RAC: 97
Message 30090 - Posted 27 Oct 2006 9:07:44 UTC

Starting tomorrow we will be sending out our first workunits for our newest ab-initio structure prediction project. These workunits will look like this:

s001__BOINC_ABRELAX_SAVE_ALL_OUT_hom001_

We are collaborating with a number of structural genomics centers so that we will attempt to predict structures whose structures will be solved within the next six months. This project is very obviously inspired by the CASP competitions, and some of us in the lab have started calling this "CASP all-the-time." It will be very useful for us to have a running benchmark of our methods on absolutely new crystal structures.
____________

Brian Kidd
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Dec 9 06
Posts: 5
ID: 134216
Credit: 327
RAC: 0
Message 32315 - Posted 9 Dec 2006 9:33:06 UTC

I recently submitted a job (BAK_1avs_TnC_loop_model) to predict the structural changes in an allosteric protein called Troponin C. This protein regulates muscle contraction and has important health consequences for cardiovascular disease.

In the near future, I will be submitting more workunits related to predicting conformational changes in allosteric proteins. For each submission, I'll describe a little about the protein's function and it's relevance to medicine or technology. As for allosteric proteins, they are particularly interesting cases for structure prediction because their function is regulated by structural changes that take place far from the active site. As a consequence these proteins have multiple functional states and we are attempting to predict the structures associated with these states. Our basic approach is to exhaustively sample the conformational away from the starting structure and look for additional energy minima. Thanks for your help in this important problem.
____________

Brian Kidd
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Dec 9 06
Posts: 5
ID: 134216
Credit: 327
RAC: 0
Message 32317 - Posted 9 Dec 2006 10:54:50 UTC
Last modified: 9 Dec 2006 10:56:47 UTC

A new job was just submitted - BAK_1klf_FimH_loop_model.

This protein is a bacterial adhesion protein called FimH. The protein resides on the tip of projections that pathogenic e. coli extrude from their cell wall to stick to glycoproteins on the surface of human cells. The neat feature of this protein is that its binding is regulated by force, and the bond lifetime actually increases as force increases, called a "catch bond". This catch-like behavior is counter intuitive and contrary to how most bonds function - most bonds decrease their lifetime if you yank on them with more force. Predicting the structures of both states, active and inactive, would have huge implications for treating bacterial infections and developing novel molecular force sensors.

FimH is an allosteric protein and the structure of the unbound - "inactive" - conformation has been technically difficult to solve by experimental methods. However, we ought to be able to predict the inactive structure with the loop modeling and full-atom relax techniques that we've been adapting to look at allosteric proteins. In addition, some recent work by our collaborators has given us reason to believe that there will be a NMR structure in the not too distant future. This experimental data will be a great validation for the predicted structures that you all are solving with Rosetta@home. Thanks so much for your support in predicting this structure.
____________

Mod.Zilla
Forum moderator
Project administrator

Joined: Sep 5 06
Posts: 416
ID: 110113
Credit: 6
RAC: 0
Message 38063 - Posted 20 Mar 2007 21:31:46 UTC
Last modified: 20 Mar 2007 21:38:57 UTC

More details from Chu on Protein-protein docking and CAPRI.

More details from Rhiju on RNA folding
____________
Rosetta Informational Moderator: Mod.Zilla

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 38164 - Posted 23 Mar 2007 5:43:40 UTC
Last modified: 23 Mar 2007 5:44:27 UTC

The work units labeled RNA_ABINITIO model folding of small chains of RNA, a molecule similar to DNA. They start from extended chains and then try to maximize the pairing and stacking of the RNA sidechains ("bases").

Info on where the sequences come from:

1a4d
E-loop from ribosome 5S RNA

1esy
HIV psiRNA dimerization signal

1kka
tRNA anticodon stem loop

1l2x and 2a43
Viral frameshifting pseudoknots

1qwa
Nucleolin binding hairpin

1q9a
Sarcin/ricin loop from the large ribosomal subunit

1zih
GNRA tetraloop

2f88
Domain I from the self-splicing Group II intron

1xjr
Rigorously conserved RNA element from SARS

1ehz
tRNA(phe) from yeast

1gid
The P4-P6 domain from the self-splicing Tetrahymena group I ribozyme

____________

Ingemar
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 38199 - Posted 24 Mar 2007 0:31:21 UTC

Jobs with names DOCKING_*rhj* is running protein-protein docking on dimers where the individual monomers are related by symmetry. The structures are coming from ab-initio structure prediction. The structure of this protein could not be solved by standard techniques used for determining crystal structures. Although a crystal could be grown and experimental data looks good one of the procdures in crystal structure determination, phasing, could not succesfully be carried out starting from similar proteins previously solved. The idea here is to create models with ab-intio+docking that can be used as starting points for the phasing procedure. If this works this could be way of rescuing data from x-ray crystallography data from biologically important proteins that can not be converted to 3D-structures, which would a significant breakthrough.

Thanks for your help!


____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 38959 - Posted 4 Apr 2007 5:17:07 UTC
Last modified: 4 Apr 2007 5:17:25 UTC

We're running a new kind of workunit with the tag SYMM_FOLD_AND_DOCK in the names. These are some pretty crazy jobs -- you'll see two protein chains shaking and dancing around each other. We're trying to predict structures of proteins that form symmetric multimers (a category that includes many many proteins including virus coats and proteins that modulate DNA transcription). Previously you've seen straight folding of one chain, and docking of two structured chains, but this is the first time we're simultaneously exploring the fold and the relative orientation of the chains. This is a huge conformational space to explore, and is onl possible with Rosetta@home.

A first application of this new protocol is to a structural genomics target (s036) for the purpose of "phasing" crystallographic data -- see the post by Ingemar below.

____________

Ingemar
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 50444 - Posted 7 Jan 2008 22:01:58 UTC

In the previous post Rhiju described a simulation called FOLD-AND_DOCK where we are predicting the structure of protein complexes from sequence alone. In a cell few proteins carry out their function on their own. Most proteins are either forming complexes with other proteins a fraction of the time (and the structure of these is something we try to predict with protein-protein docking simulations) or form permanent complexes consisting of multiple chains. Predicting the structure of a protein complex from sequence(s) of the involved chains is something that has never been shown, mostly because of the enormous search space that has to be covered in such a simulation.

A very important class of protein complexes are those that contains multiple copies of the same type of chain, we call them homooligomeric. A vast majority of these homooligomeric proteins contains some type of internal symmetry. For example, the most abundant form of homooligomers are dimers (two identical chains) and most are symmetrical: if you rotate one chain 180 degrees you get the other partner in the complex.

The fact that they are symmetrical allow us to reduce the search problem (all the chain are internally identical and are related by a set of rotations and translations specified by the type of symmetry). This makes it feasible to try to predict the structure of symmetrical protein complexes from sequence. Its still a huge search problem, and thats why we need your help!

Right now we are trying to predict the structure of a protein with pdb code 1zpy. The size of this protein is actually around 900 residues and if we succeed it would make the largest protein structure predicted to atomic accuracy from sequence (About 8 times larger than anything predicted before). We
are not actually simulating the whole protein, we can make use of symmetry to reduce the problem, but its still a large simulation. So if you get a job with the 1zpy in the name be patient!

Why are we doing this? Homooligomeric proteins are a biologically very important class of proteins. They are forming virus capsids, proteins that cleaves DNA, hemoglobin that transport oxygen, actin involved in muscle contraction, channels that transport ions and secretion systems used by pathogenic bacteria just to give a few examples.

____________

Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log


Home | Join | About | Participants | Community | Statistics

Copyright © 2010 University of Washington

Last Modified: 3 Dec 2007 20:36:19 UTC
Back to top ^