Dr. Baker's journal archive 2006

Message boards : Rosetta@home Science : Dr. Baker's journal archive 2006

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13035 - Posted: 4 Apr 2006, 6:03:05 UTC

Rhiju has a new exciting approach to the search problem I'd like to tell you about. Tests of his approach have been running the past several days on your computers and the results are very promising.

Recall that our search for the lowest energy structure occurs in two stages which are clearly distinguishable when you watch the progress of the calculations on your screensaver. In the first "low resolution" stage, the protein explores a wide range of different conformations, often changing quite wildly. In the second "high resolution" stage, the range of motions is much smaller because all the atoms in the protein are represented in detail and almost all large changes would lead to impossible structures with atoms on top of each other.

In the low resolution stage, we can sample broadly and rapidly, but because of the approximate representation of the protein chain, the computed energies are not very reliable. In contrast, in the high resolution stage we can compute energies accurately, but it is very difficult to sample.

Rhiju's idea is to try to combine the best of both worlds: the accuracy of the high resolution energy calculations and the rapid and broad sampling of the low resolution calculations. You can think of the many models returned by your searches as building up a map of the high resolution "energy landscape". What Rhiju does is to take a large set of high resolution structures and their energies returned by your computers, and derive a model of the high resolution energy landscape from them. He then starts a new large scale set of low resolution runs, ADDING this modeled energy landscape to the standard low resolution energy function. These new runs can explore space broadly and rapidly, but will be guided to the regions that are low in energy according to the high resolution energy model.
As I mentioned above, his results over the past few days have been very promising, with good predictions for a number of proteins we were struggling with before.

ID: 13035 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13069 - Posted: 5 Apr 2006, 6:32:50 UTC
Last modified: 5 Apr 2006, 6:33:34 UTC

Today I looked again at the results of the HBLR_1.0 runs I sent out a month and a half ago. Quite a few more results have been returned since I last analyzed these, and I was blown away by what I saw.
For example for 1dtj, for which there is a snapshot of the results as of March 10 on the "top predictions" page, there is now a single point much lower in energy than any other that has very low rmsd. We now have 1.1 million (!!) results returned for this protein, and one of these 1.1 million runs hit the jackpot. A couple of other of the test proteins have similar "one in a million" amazing low rsmd and low energy points, and the results for all of the proteins have gotten considerably better than earlier with the increased sampling of the energy landscape. Again, at the risk of sounding like a broken record, these results really highlight the absolutely critical role of massive distributed computing in solving the protein folding problem--in house we were able to do ~10,000 independent runs, but 1,000,000 was completely out of the question.

With our improvements over the past few months and this big increase in sampling, the prediction failures are becoming far fewer, but for the thornier problems it is clear that we are not sampling enough. With the 150 teraflops the project is aiming for, even these should fall into place.

so everybody please look at the letter to inactive rosetta users in the threads below, and spread the word!

thanks!

David


(on a more technical note, Rom knows what the problem is with the 0 cpu time on the win98 computers, and will have it fixed soon. in the meantime, please continue to crunch with these machines, credits will be awarded and the results are being collected as usual)
ID: 13069 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13522 - Posted: 12 Apr 2006, 6:09:22 UTC

As you know, I mistakenly sent out a large batch of jobs without properly testing them first on RALPH. I apologize again for the trouble this caused you over the weekend.

I did get enough results back from machines which did not have problems with the jobs to see that the improvement in sidechain sampling does improve the overall search. Particularly dramatic were the 1di2 and 1dtj cases, which with the standard protocol had "1 in a million" low energy low rmsd points, but with the improved sidechain sampling protocol had many more points in these (correct) low energy minima. We are currently working to track down the source of the windows specific problem in the sidechain sampling routines, and when this is fixed we will test on ralph and then (after verifying that the error rate is low!) transition on to rosetta.

The frequent updates and experimenting with new methods is going to change soon to putting together everything we have learned since rosetta@home started in september for the casp7 structure prediction challenge. here is an email I recently got from the organizers which includes the URL for the project--the exact starting date hasn't been announced but will appear on their site soon.

From: casp@predictioncenter.org
Subject: CASP7 registration is open
Date: April 4, 2006 4:41:01 PM PDT
To: casp@predictioncenter.org
Reply-To: casp@predictioncenter.org

Dear members of the CASP community,

It is this long-awaited time of the year again!
We are starting CASP7 season. The registration is now open.
We encourage you to register at your earliest opportunity as of
the next week we will stop sending CASP-related emails to our
CASP6 distribution list and will start sending those to the people
that registered for CASP7. The early registration is especially
important to our server curators as we are planning to have server
dry run in the mid-April.

The main CASP7 web page http://predictioncenter.org/casp7/
contains the details for this round of experiment. If you can not find
answers to your questions there - please write us at
casp@predictioncenter.org .

Hope we all will be having a fruitful and enjoyable season! Good luck!

CASP7 organizers
ID: 13522 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 13619 - Posted: 13 Apr 2006, 5:16:28 UTC

Good news today:

first, Rhiju and I found the bug in the rosetta code that caused several of his jobs to get stuck. I'd describe it to you, but it is pretty arcane, and only affected proteins of exactly 44 amino acids so it had not been seen before. Rhiju met up with this bug as he has been following up recent observations that cutting the ends off protein sequences can signfiicantly improve prediction results for the core of the sequence. Rhiju has cancelled the offending jobs, and corrected the problem in the code, so this will not happen again.

second, David Kim has awarded credits to those who lost valuable time during the problems last weekend.

third, I've had excellent discussions with Janet Skeels, in the UW Office of Research. She and her office have done a wonderful job helping with publicity, see the UW home page at:

http://www.washington.edu/
ID: 13619 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 14427 - Posted: 23 Apr 2006, 1:47:59 UTC
Last modified: 24 Apr 2006, 23:05:28 UTC

Hello All,

sorry for not reporting earlier, I was away this past week with several people in my group at a meeting in Florida on designing new enzymes to catalyze any desired chemical reaction. this is a very exciting area and hopefully we will be able to run some of our design calculations on rosetta@home after CASP7 finishes.

Bin and Rhiju have been doing wonderful things to try to keep all of you and your computers happy.
I'm a bit out of the loop on this as I've been away, but I thought I'd share with you some of the correspondence they have been cc'ing me on so you can see what the pace is like here even on a sunny spring Saturday in Seattle. The watchdog thread is Rhiju's solution to stuck jobs--any job that runs for more than a specified length of time gets killed; this was originally suggested by some of you on the message boards.






(Bin to Rhiju)
Date: April 22, 2006 4:50:27 PM PDT

OK. If you have the error info thing, then the only other change is add a
set_pose_flag(false);

before output_decoy() in get_the_hell_out().

Of course if we don't output_decoy then this is not necessary.

Thanks!

Bin

----- Original Message ----- From: "Rhiju Das"
Sent: Saturday, April 22, 2006 4:47 PM
Subject: Re: should we increase the max cpu run time to 4 days


Hi Bin:

I just wrote the same error-info thing but also haven't checked in! I'm also changing a few other things. Maybe instead of checking into SVN, can you point me to your code?

Also, I was thinking that we should not output_decoy, just a blankfile -- that way we won't get confused with incomplete decoys. This is similar behavior to DK's 5 strikes.

Cheers,
Rhiju



Subject: Re: should we increase the max cpu run time to 4 days
Date: April 22, 2006 4:43:36 PM PDT

Sounds good.

While we were at this, I'm changing a couple of things in watchdog too:

in get_the_hell_out(), before calls output_decoy(), I'm adding a set_pose_flag(false). This is because if the watchdog killed the thread while running in pose mode, output_decoy() will fail since it's not compatible with pose.

also I added different error info for score_not_change killing and twice_cpu_pref_time killing.

I can check in this changes in a minute if you can OK them.

Bin


Sent: Saturday, April 22, 2006 4:32 PM
Subject: Re: should we increase the max cpu run time to 4 days


Hi Bin:

Can you wait until the evening to resubmit your jobs? I'm making some additional changes to the watchdog to make it gracefully exit (so the user automatically gets credit), based on advice
from the message boards. I can hopefully get ralph 5.03 up and running by tonight.

Cheers,
Rhiju


On Apr 22, 2006, at 4:26 PM, Bin Qian wrote:


You are right! My outfiles are named wrong!


Sent: Saturday, April 22, 2006 4:23 PM
Subject: Re: should we increase the max cpu run time to 4 days



Hi Bin:

The -161 error looks like a file transfer error ... can your run one of these guys locally
and see if the right filename is being outputted? I'm having the same errors with my jumping runs; looking into it.

Thanks,
Rhiju

On Apr 22, 2006, at 4:15 PM, Bin Qian wrote:



Hi Rhiju,

I looked at the results this morning and noticed the failures, but I think they are more than just aborting by watchdog. For example the NO_DOG jobs that had -no_watchdog flag are also failing. Actually they are not really failing - almost all the failing jobs (with or without watchdogs turned on) have the following error information:
5.4.4

# random seed: 3885595
# cpu_run_time_pref: 14400
# DONE :: 1 starting structures built 5 (nstruct) times
# This process generated 5 decoys from 5 attempts
# 0 starting pdbs were skipped



NO_CHECK_NO_DOG_7486h002_dec129_1.pdb_408_4_0_0
-161



So looks like the WU did make 5 decoys as specified (-nstruct 5), but for some reason returned with an error code -161. Actually the exit status of these WUs are all "0", which I thought meant non-error.

Wait, even the successful WUs have the following error info:
http://ralph.bakerlab.org/queue_ops/db_action.php? table=result&id=92975
5.2.13

# random seed: 3885617
# cpu_run_time_pref: 3600
******************************************************************** **
Rosetta score stayed the same too long. Watchdog is killing the run!
******************************************************************** **



NO_CHECK_7486h002_dec184_1.pdb_407_17_0_0
-161




Is error code -161 returned by the watchdog thread?

Bin

Sent: Saturday, April 22, 2006 12:59 PM
Subject: Re: should we increase the max cpu run time to 4 days


Hi Bin:

The watchdog is working -- maybe too well. I have it shut down Rosetta if it goes for longer than twice the default cpu run time (an hour for ralph). For ralph that time was 1 hour, so all of your jobs are being aborted after two hours!

I just changed the ralph_submit time to be 3 hours; can you send out your jobs again? Also, if you want your jobs to run longer than 6 hours, you can use the flag "- cpu_run_timeout_factor " which is currently set to two, for twice the deafult run time.

I'm realizing now that I should use the actual cpu_run_time from the boinc api, so I may make that change later today and post ralph 5.03.

Thanks,
Rhiju

ID: 14427 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 14521 - Posted: 24 Apr 2006, 5:30:45 UTC

Tonight I would just like to recommend to everybody to look at Rhiju's descriptions of current work units in the "Active work units log" on these boards. They are really terrific and will give you a picture of the improvements in the science we are currently working on. On the computing side, Bin made an exciting discovery today--with his new frequent checkpointing during the relax protocol, many more structures seem to be returned than previously; this could reflect work being lost in the earlier runs when rosetta@home is temporarily interupted. You should see this advance together with Rhiju's new watchdog thread and other improvements along these lines on Rosetta@home by the end of the coming week.
ID: 14521 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 14723 - Posted: 27 Apr 2006, 4:56:17 UTC

I'm delighted to see that the rosetta@home throughput has been climbing recently! This is excellent timing as CASP7 is scheduled to start very soon. When it does start, there will be some changes in the screensaver as the true structure will not be known; we will have a question mark and the target number instead of the native structure.

As a warmup, we will be running targets from CASP6 on rosetta@home this week; some of these proteins are larger than what you are used to, but tests on ralph have not shown any problems. For the larger proteins we stop after the low resolution search because of the greater memory requirements of the high resolution search.

We had planned to release the new version of rosetta@home yesterday, but there have been some lingering issues with the watchdog and preemption that have taken a bit longer than we anticipated to resolve. The hope is to send out the new version tomorrow; if not we will wait until monday because many of you dislike weekend releases.
ID: 14723 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 14962 - Posted: 29 Apr 2006, 4:10:09 UTC

Lots of neat stuff coming up this week!

Rhiju, Bin and David K. think they have the watchdog and checkpointing all working properly. This means
(1) No stuck jobs ever!!
(2) Little time wasted when Rosetta is taken out of memory--process resumes in the middle of whatever structure was being calculated.

I'm excited to see the results of jobs running now testing the improved sidechain sampling. The outcome should be clear by next monday/tuesday. Rhiju will then be testing an exciting combination of the "jumping" protocol invented by Phil Bradley here, the star of CASP6, with fullatom refinement.

Will Sheffler, a graduate student here, has developed a "smoother" version of the energy function which we hope will make possible the finding of deep minima from further away. We should be able to start testing this next week as well.

Other good news is that Microsoft has generously agreed to cover Rom Walton's consulting fees. This means that Rom will be back soon to put in more robust backtracing on ralph and rosetta to allow him and us to track down the remaining access violation errors. He will also fix the 0 credit win98 problem as promised some time ago. He is deeply involved in getting the latest boinc release ready for prime time, so it may be a little while before you see him back in action here.
ID: 14962 · Rating: 3 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15133 - Posted: 1 May 2006, 6:13:08 UTC

The CASP6 test proteins currently running are larger than many of the proteins we have been testing our methods with thus far, which are small by protein standards. In CASP7, which is about to start, we expect the size distribution to resemble those of the CASP6 targets proteins, and that is why we are currently running tests with these proteins. Since these proteins are longer, the calculations take longer and require somewhat more memory. With the safeguards and checkpointing Rhiju and Bin have put in place, we hope these work units do not cause any trouble for you other than taking somewhat longer to complete--please let us know if this is (or is not!) the case.
ID: 15133 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15248 - Posted: 2 May 2006, 6:43:47 UTC

Good news tonight!

(1) Error rates are lower than ever (2% Linux, 5% Windows, 6% Mac) even though we are currently running calculations on larger proteins which are more computationally demanding. Great job
Bin and Rhiju!!

(2) After going through the code today, I think we can reduce the memory requirements for the larger proteins by at least 25%; I hope to make progress on this front this week. In addition to easing the burden on lower memory machines, this may help to reduce some of the remaining low frequency errors.
ID: 15248 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15600 - Posted: 6 May 2006, 6:05:03 UTC

Today I got several phone messages/questions about things I had said on TV last night, which I found pretty bewildering; at first I thought it must be a different David Baker, but finally tracked it down to a showing on UW TV of a videotape of a lecture I gave in the computer science department here a few months ago. I tried to watch a bit of it to see whether it might be interesting to participants, but was so horrified by the ums and uhs that I had to stop after the first minute. In any event, if you are interested in finding more about our research, and can stand the ums, you can find it at:

http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=449


Other news is that a reporter from the Associated Press is doing a story on rosetta@home and will be contacting some of the volunteers posting in the message boards.

On the science side, we have some new ideas for improving predictions for larger proteins using the low resolution part of the Rosetta folding process which you should see in action soon.

And finally, CASP7 is scheduled to begin on Monday, so look for some exciting prediction challenges coming to your computer very soon!


ID: 15600 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15673 - Posted: 8 May 2006, 2:51:20 UTC

I just recieved the following email from the CASP organizers with the latest information on the CASP7 eperiment which begins on May 10:

Dear Predictors,

CASP7 will begin May 10 with a single target trial followed by two
targets on May 11.

Some additional notes on this years’ process:

MODEL ACCURACY
We will pay special attention to model accuracy predictions. These can
be submitted for own predictions in the regular CASP format (PDB
B-factor field) on a per atom basis. These should be error estimates in
Angstroms. In addition an overall score for a given model can be
submitted as follows:
REMARK SCORE 0.7
in prediction header, where SCORE range is 0.0 to 1.0 (1.0 being a
perfect model). Independently, model accuracy predictions can be
submitted on server models, usually available within few days of target
release, or on your own models in the following format:
http://predictioncenter.org/casp7/doc/casp7-format.html#QA
The deadline for accuracy predictions on server models will be the
regular deadline for that target (typically 3 weeks).

In addition, to assess model evaluation methods on all CASP models
(i.e., including human expert predictions) we will be shorly collecting
software for a later assessment. Additional details will be announced
separately.

MODEL REFINEMENT
Special attention will also be paid to model refinement. In some cases
(CM/easy targets of less than 150 residues, preferably less than 100
residues), a single model submitted during the regular prediction window
will be selected for further refinement by others. Refinement window
will then open for additional 3 weeks. The usual refinement of own
models is still encouraged (using the unrefined and regular – for
refined models - model designations).

PREDICTION WINDOWS
Prediction windows will in general be shorter than in previous CASPs
(approximately 3 weeks). This is to adhere more closely to the
target structure release timelines adopted by crystallographers and to
minimize information leaks and subsequent target cancellations. However,
to allow assessment of methods requiring longer computation times, at
least some target deadlines will be extended. In such cases we will
still strongly encourage submitting models within the 3 week prediction
window. If information leak occurs after the initial three weeks but
before the assigned prediction deadline, evaluation of models will be
limited to those submitted within the 3 week window only.

PREDICTION OF FUNCTION
The format for function predictions is as follows:
http://predictioncenter.org/casp7/doc/casp7-format.html#FN
Additional targets will be made available for this category of
prediction (targets for which experimental structures may not be
forthcoming).

MODEL QUALITY FILTERS
Human expert predictions with severely unrealistic geometry will be
rejected outright. The criteria for this are as follows:
More than 5% of CAs taking part in clashes of less than 1.9 Angstrom.
OR
More than 25% of CAs taking part in clashes of less than 3.6 Angstrom.
CA-CA clashes below these percentage values as well as segmented
predictions with more than 4 chain breaks (CAs adjacent in sequence
separated by more than 5 Angstroms) will be flagged (warnings issued).
The model will be accepted, but it might be penalized in the assessment.
Missing loops or other deletions are acceptable.

Server predictions with clashes will be accepted in all cases, but
similarly to the human expert predictions will be issued warnings and
might be penalized in the assessment.

--
CASP7 organizers


ID: 15673 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15675 - Posted: 8 May 2006, 5:39:00 UTC


I was just looking at the results for the top teams:

Name Members Recent average credit Total Credit Country
1 XtremeSystems 246 311,043.65 25,052,975.81 International
2 Dutch Power Cows 1071 261,066.85 24,927,849.52 Netherlands
3 Free-DC 154 174,032.69 28,027,746.46 International

this is serious computing power; I think the recent credit numbers correspond to
3,100, 2,600 and 1,740 computers crunching more or less full time
for these top three teams which is fantastic!

It looks like the DPC have nearly caught up to XtremeSystems in total credit, but XtremeSystems is
moving ahead faster. Why does Free-DC have the most total credits but considerably less than the other two teams recently?

We will have to have prizes when CASP finishes at the end of July for the top overall team and the
top team during CASP. definitely a citation in the science paper on rosetta distributed contributing at the minimum!
ID: 15675 · Rating: 0 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15705 - Posted: 9 May 2006, 6:14:48 UTC

The lead article in the June issue of Scientific American has an article descibing some of our work on engineering new molecules. I've been discussing with the editors an article on rosetta@home. With CASP starting in a couple of days, we will have a parallel competition for the "top 5 teams" just as in CASP we get to submit the "top 5 models". We will be keeping track of the total credits earned by each team from the period May 10 to Aug 1 when CASP ends, and will describe the winning teams and their contributions to the CASP prediction efforts in the above article and in book chapters on distributed computing we will be writing at the end of the summer.
The spirit of friendly competition has made CASP exciting for the past 10 years, and it is great that
this can extend now, in a very positive way, to the key problem of computing power!



ID: 15705 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15754 - Posted: 10 May 2006, 5:20:45 UTC

In anticipation of the soon to be coming CASP7 targets we have made a concerted effort to reduce memory use in Rosetta and I'm optimistic we can get below 100Mb for a reasonable size (150 amino acid) protein. Rhiju found that on his mac a run for a large protein used 150Mb when the graphics was off, but well over 300Mb with the graphics on. We are working to track down why the graphics are taking so much memory.

Two questions:
(1) What level of memory use are you seeing for rosetta@home (with graphics) on your computers?
(2) Should we disable the graphics in the next release to reduce memory use for larger proteins?
(at ~100Mb per work unit, there should be no problems on most machines).
ID: 15754 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15757 - Posted: 10 May 2006, 6:58:53 UTC

CASP7 is starting tomorrow, and we are excited both about testing the methods we have been developing over the past 9 months on proteins of unknown structure, and also testing some new ideas
that we think can really improve structure prediction for larger proteins. One of these ideas that seems very promising now is that we may be able to recognize even at the low resolution level, where computing is much faster, some topological features which distinguish native structures from random chain conformations.

One of the great things about CASP is that it inspires all the participants to come up with new ideas and approaches before and during the experiment. So in the next month you can expect both work units for casp targets with unknown structures, and work units for proteins of known structure where we are testing out very recent ideas.
ID: 15757 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 15896 - Posted: 11 May 2006, 6:44:14 UTC

The first CASP 7 target was released today!

Here is its amino acid sequence:

MSFIEKMIGSLNDKREWKAMEARAKALPKEYHHAYKAIQKYMWTSGGPTDWQDTKRIFGG
ILDLFEEGAAEGKKVTDLTGEDVAAFCDELMKDTKTWMDKYRTKLNDSIGRD

can you tell from this what the three dimensional structure and function
of this protein are?

The problem with proteins, of course, is that you can't
read off directly from the sequence what the structure and function are, although
both are completely determined by the sequence (the genetic blueprint, quite literally).

We are excited because this protein looks unrelated to any protein of known structure, and is
not too much bigger than most of the proteins we've running tests on these many months, so it
is a perfect challenge for the methods we've been developing. After some quick runs on RALPH to
make sure work units behave properly, you should see work units for this protein by the end of tomorrow!




ID: 15896 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16011 - Posted: 12 May 2006, 5:38:12 UTC

Two more CASP7 targets were released today. One of the sequences is clearly similar to a protein with known structure, and we will use the known structure as a starting point in the searches. The other new sequence, like yesterdays, is not related to any sequence with a known structure, so again we will predict its structure using the methods we've been developing here for the last months. It is a little harder though--at 200 amino acids larger than almost all of the test proteins we have been testing on. So we are in somewhat uncharted territory here, this is the great thing about CASP--you have to try to solve problems that you would not have otherwise attempted! By stimulating people to try to solve very hard problems CASP is a great stimulant to progress in the field.

Things are getting very busy already, and we are only on the second day!
ID: 16011 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16052 - Posted: 12 May 2006, 14:52:29 UTC

I thought I'd answer here two good questions that were just raised in the "Comments" thread:

(1) As Rollo suggested, for our five submissions for CASP we will take the lowest five energy structures ensuring that none are really close to each other (unless all the low energy structures are similar to each other as we saw in our tests on some of the easiest proteins).

(2) For large proteins, we don't have any data to guide us as far as what to expect as far as prediction accuracy. We did do tests on CASP6 targets, but it is clear that these were greatly limited by sampling, and for CASP7 we are doing MUCH more sampling for each target than we've done in any of our tests.
So CASP7 is very much an "experiment" for us as it is supposed to be.

Also, Moderator9 asked me to remind everybody that since we don't know the true structures of the CASP7 targets, on the screensaver no "native structure" will show up and the rmsd cannot be computed.
ID: 16052 · Rating: -1 · rate: Rate + / Rate - Report as offensive
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16233 - Posted: 14 May 2006, 6:20:09 UTC

I just got an email from CASP which I've copied below; next week is going to be busy!!



Dear CASP7 participants,

Quick update on the experiment progress.

Today (Friday) we have closed accepting server predictions for the first
target, T0283. 80 servers submitted their predictions for this target.
That's already an impressive increase from the total of 62 servers that
participated in CASP6. We are still waiting for several more servers
which curators are still working on their setup. Server predictions
will be made available through our web site as soon as we stop accepting
server corrections (3 days after the target release). Then all the
participants (servers and humans) will have an opportunity to try
themselves in assessing quality of server models (we have announced
earlier about our new QA category).

Next week we plan to release at least 8 new targets starting with two on
Monday.

We are receiving A LOT of emails these days. But amazingly, only one
predictor wrote to us about the gap in target numbering (T0284 and then
T0287). Please, don't be surprised if you see cases like this in the
future. Nothing special about it - I just prepared different targets for
release on Friday and then I had to let other 2 targets go in front of
the prepared ones because of their shorter deadline. The two skipped
targets (T0285 and T0286) will be released next week.

--
Andriy Kryshtafovych
CASP team
ID: 16233 · Rating: 1 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Rosetta@home Science : Dr. Baker's journal archive 2006



©2023 University of Washington
https://www.bakerlab.org