CASP 10

Message boards : Rosetta@home Science : CASP 10

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
TJ
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Oct 10
Posts: 9
Credit: 216,670
RAC: 0
Message 72941 - Posted: 30 Apr 2012, 19:37:35 UTC

Hello everyone !

CASP 10, a community wide experiment in structure prediction starts tomorrow on May 1st and runs to August 1st. During this time we will be using BOINC heavily for structure prediction. If your work unit starts with the label rb you're running a CASP 10 target! rb is short for Robetta which is our publicly available server for structure prediction.

CASP
CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction.

Robetta
Structure prediction for the community, by the community. Robetta is a server for protein structure prediction that shares Rosetta's structure prediction capabilities to the scientific community (and to the public). The computation for this will be conducted on BOINC meaning that you guys will be crunching protein structure prediction jobs for real scientific studies conducted by researchers all over the world.

Improvements since CASP 9
Over the last two years we have extensively modified our structure prediction methodology. Preliminary results indicate that we've made more improvement in the last two years than in the previous 6 years combined. For the first time there is significant doubt wether humans can improve upon the results from computers. So this could be a very exciting CASP.

Thanks again everyone for crunching, we wouldn't be able to do this stuff without you !

Excitedly yours,
Chris, Ray, Frank, Yifan, David Baker, David Kim, Hetu and TJ
ID: 72941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 72949 - Posted: 1 May 2012, 4:52:43 UTC - in response to Message 72941.  

CASP
CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction.

You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before:

The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount.

The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count?

I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments.

Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise.

On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back.
ID: 72949 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 72954 - Posted: 1 May 2012, 5:31:17 UTC

Hi TJ.

Quote[ CASP 10, a community wide experiment in structure prediction starts tomorrow on May 1st and runs to August 1st. During this time we will be using BOINC heavily for structure prediction. If your work unit starts with the label rb you're running a CASP 10 target! rb is short for Robetta which is our publicly available server for structure prediction. ]quote.

The only problem is I've been seeing these task names for weeks now, is there going to be some other way to tell which are really CASP tasks.

Something added to the task naming many be.

ID: 72954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 72956 - Posted: 1 May 2012, 5:52:06 UTC - in response to Message 72949.  

Yes you are absolutely right. for CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. so please do set your buffer to a shorter time, and let us know if you are running out of wu. thanks!

CASP
CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction.

You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before:

The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount.

The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count?

I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments.

Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise.

On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back.


ID: 72956 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72962 - Posted: 1 May 2012, 16:23:03 UTC - in response to Message 72954.  

The only problem is I've been seeing these task names for weeks now, is there going to be some other way to tell which are really CASP tasks.


A large number of those workunits have been pre-CASP testing - that is, running the entries from previous CASPs through the CASP10 structure prediction machinery and checking that everything is working properly. Now that CASP has started, that testing is pretty much over (although there might be occasional tests to double check something, or to try a last-minute fix).

A small portion of those workunits were for structure prediction jobs which were submitted to Robetta by other research groups. But to conserve resources, that public submission is going to be disabled for the duration of CASP.

So if you see a rb task in the next few months, in all likelihood it should be for CASP.
ID: 72962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sean Kiely

Send message
Joined: 31 Jan 06
Posts: 64
Credit: 43,992
RAC: 0
Message 72964 - Posted: 1 May 2012, 17:19:26 UTC - in response to Message 72956.  

I would recommend that you post an item under "News" on the homepage (and also a new thread in the number-crunching forum) asking participants to check their work buffer settings and reduce them to no higher than 1.5 days? This might reduce the number of CASP units that are processed but not returned quickly enough to be useful.

Yes you are absolutely right. for CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. so please do set your buffer to a shorter time, and let us know if you are running out of wu. thanks!

CASP
CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction.

You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before:

The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount.

The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count?

I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments.

Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise.

On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back.



ID: 72964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 72966 - Posted: 1 May 2012, 18:11:49 UTC - in response to Message 72956.  

Thanks for the quick reply. I didn't anticipate you did post-processing of results - is ~1.83 days (1.5days + 8 hour runtime) sufficient for you? What would your ideal maximum turnaround time be?

Another issue that arose last year was the fact that the BOINC manager doesn't help us adhere to a quicker-than-usual turnaround time because of issues like "debt" between projects (I'm not qualified to talk about this tbh but I know there's a factor involved). Personally I'll be setting WCG to "No New Tasks" for the duration as Rosetta is my primary project.

The biggest issue last year, though, was the "Deadline" we see in the Boinc Manager being set at 10 days from download - especially when a contributor runs more than one project (due to Para 2). Is there any way you can set the deadline for specific CASP10 tasks to your preference - your ideal maximum-turnaround time? That way, the BOINC manager will ensure that targets are met rather than (effectively) working toward missing them.

I notice this afternoon I've received a non "RB" task from Rosetta (ab_centroidAbrelax_cst_3qc7A) after the first rb tasks have come down. In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end?

I can't think of any other issues that might prevent the CASP exercise from operating successfully.

Yes you are absolutely right. For CASP we need results back within a day or two, as our approach is iterative: we analyze the results after one day and send out another set of wu based on these results for two days of computing, then collect the results and submit to CASP. So please do set your buffer to a shorter time, and let us know if you are running out of wu. Thanks!

CASP
CASP is an international experiment to assess the state-of-the-art of the protein structure prediction field. Sequences, whose structures have been solved but which have not yet been published are sent out to participating teams and we have a 3 days to send back predictions. The whole thing is conducted in a double-blind fashion ensuring fair assessment and truly blind prediction.

You state you have 3 days to send back predictions. Can I ask a very specific question that I've raised before:

The default work buffer set is 0.25 days with a 3 hour runtime, but some of us maintain a larger work buffer in order to avoid task outages. I personally use 2.0 days, but others may use a larger amount.

The default settings allow tasks to be returned to you in good time, but is it true to say that if the work buffer+runtime totals more than 3 days, then the work we grab will not be returned to you in sufficient time for the results to count?

I will assume this is the case, so I'm reducing my work-buffer to 1.5 days - plus my 8-hour runtime - to allow a certain leeway for you to receive work back in time. Please confirm so that others can make similar adjustments.

Obviously, with reduced work buffers, there's an equivalent requirement for tasks to be reliably available at your end, so an extra degree of monitoring would be wise.

On the assumption that my guesses are correct, you may see a reduced rate of task downloads while our buffers are run-down, though tasks wil be returned a certain amount sooner after release. As long as tasks are readily available there should be no reduction in results you see back.

ID: 72966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1847
Credit: 7,987,219
RAC: 8,801
Message 72968 - Posted: 2 May 2012, 15:56:00 UTC

I'm downloading CASP9_benchmark again...
ID: 72968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Oct 10
Posts: 9
Credit: 216,670
RAC: 0
Message 72969 - Posted: 2 May 2012, 16:28:19 UTC - in response to Message 72968.  

I'm downloading CASP9_benchmark again...

We'll be using CASP 9 to test our system while CASP 10 runs. We know what the correct solution for CASP 9 is but won't know the solutions for CASP 10 for a couple weeks.
ID: 72969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 29 Oct 08
Posts: 61
Credit: 2,137,555
RAC: 0
Message 72970 - Posted: 2 May 2012, 16:35:15 UTC - in response to Message 72966.  
Last modified: 2 May 2012, 16:36:12 UTC

Hi, I second Sid Celery in his proposals.

My guess is that a majority of R@h crunchers is not meeting your 1-2 days deadline for a task requirement. To be effective they need to be forced (or at least informed) to change their behaviour. As the CASP10 has already started, we would need to inform them in a blink.
Furthermore, let's be honest - a vast majority of crunchers does not read information from the projects or their teams on a daily basis. They even lag severly in e-mail communication. Moreover, even if they learn about new requirements a fair amount of participants will forget, be unable etc. to adjust their crunching pattern.

In this situation changing the deadlines for WUs in addition to the information about the issue (very important for computers without permanent access to the Internet, used for a small amount of time per day, set on longer run times etc.) seems to be the best option.

That or sending CASP10 WUs strictly basing on behavioural patterns (only to "fast" crunchers, if their computing power is big enough).

Best Regards and Happy Crunching.
ID: 72970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EmSti [BlackOps]

Send message
Joined: 28 Apr 12
Posts: 1
Credit: 536,791
RAC: 0
Message 72972 - Posted: 2 May 2012, 17:14:16 UTC

Will crunchers still get full credit for tasks sucessfully run, but not within 3 days?
ID: 72972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Plomos

Send message
Joined: 4 Mar 11
Posts: 11
Credit: 439,043
RAC: 0
Message 72973 - Posted: 2 May 2012, 18:12:24 UTC

Can I ask why WU's that seem to be for something other than CASP are being run?

I am seeing names like heterodimer_design_21_pose_B_abinitio_SAVE_ALL_OUT_46202_4097_0

and ab_11_29__optpps_T5611_optpps_03_09_35686_255345_0 in addition to the CASP9 benchmark and rb WUs. Are all of these being used for CASP? If not then which are and why are other things being sent out?
ID: 72973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 3
Message 72974 - Posted: 2 May 2012, 19:49:29 UTC

The eisiest way for Rosetta to get the results back in two days max is to simply give the tasks a 48 hour deadline time from the moment the sheduler uploads them to us and Boinc manager will hapily go into panic mode and crunch them in High Priority mode as soon as it downloads them.
No fancy coding needed,
ID: 72974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 72978 - Posted: 3 May 2012, 2:32:58 UTC - in response to Message 72970.  
Last modified: 3 May 2012, 2:33:50 UTC

Hi, I second Sid Celery in his proposals.

Thanks

My guess is that a majority of R@h crunchers is not meeting your 1-2 days deadline for a task requirement. To be effective they need to be forced (or at least informed) to change their behaviour.

I would guess this isn't true actually. Most people will work with the defaults of 0.25 days buffer & 3-hour run-times, so in the main everything ought to be fine.

The problem will be that inveterate fiddlers (presumably like you & I) will have tweaked our settings. Hopefully they cast their eye over the forums too & will catch this wrinkle before long. At the same time these same people will possibly be those who dedicate more rsources to Rosetta, so it may make a disproportionate amount of difference. Just speculating obviously.

As long as we tweak things appropriately, everyone should get what they want.

It's also worth a shout to say when CASP10 is over so we can revert to our individual preferences afterwards.
ID: 72978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1847
Credit: 7,987,219
RAC: 8,801
Message 72983 - Posted: 3 May 2012, 12:07:24 UTC - in response to Message 72966.  

In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end?


This is a GOOD idea!
ID: 72983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 72990 - Posted: 4 May 2012, 20:13:10 UTC - in response to Message 72983.  
Last modified: 4 May 2012, 20:14:26 UTC

In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end?


This is a GOOD idea!



This can be done on our end but we do not want to change things at this point. It's a great suggestion but it may cause errors initially due to past deadlines until the client can adjust to appropriately estimate run times. The majority of users use the default run time setting of 3 hours. Sid Celery is correct.

Things are working well on our end with the current settings so we do not want to modify things at this point. Please feel free to update your cpu run time preference and buffer time, (the defaults are ideal) it will definitely help to get results quickly.
ID: 72990 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 72991 - Posted: 4 May 2012, 20:15:43 UTC - in response to Message 72972.  

Will crunchers still get full credit for tasks sucessfully run, but not within 3 days?


yes of course!

also the results may be useful for our human based predictions.
ID: 72991 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 72994 - Posted: 5 May 2012, 4:15:04 UTC - in response to Message 72990.  

In order to distinguish between urgent and non-urgent tasks, CASP10 tasks should have (say) 2-day deadlines & all others the usual 10-day deadline. Can this be done from your end?


This is a GOOD idea!


This can be done on our end but we do not want to change things at this point. It's a great suggestion but it may cause errors initially due to past deadlines until the client can adjust to appropriately estimate run times. The majority of users use the default run time setting of 3 hours. Sid Celery is correct.

Things are working well on our end with the current settings so we do not want to modify things at this point. Please feel free to update your cpu run time preference and buffer time, (the defaults are ideal) it will definitely help to get results quickly.


I guess if things are running well then it's ok - Boinc manager's scheduling is flaky at the best of times - but if it could be relied upon it would cater for all the weird & wonderful settings people use without intervention or having to read the right forum thread at the right time. You'd have the best overview, I'm sure. Worth thinking about for the future though. Maybe a test at during non-critical period is something that can be tested (WCG uses short deadlines when a task has to be reissued and it seems to go through without a hitch)
ID: 72994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LanDroid

Send message
Joined: 28 Sep 05
Posts: 3
Credit: 1,387,757
RAC: 1,737
Message 73003 - Posted: 6 May 2012, 3:28:34 UTC

Structure prediction for the community, by the community. ... For the first time there is significant doubt whether humans can improve upon the results from computers. So this could be a very exciting CASP.

So this is not just a competition between Boinc/Rosetta Vs. Independent Labs, it is also Boinc/Rosetta Vs. Humans!


ID: 73003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73005 - Posted: 6 May 2012, 16:18:25 UTC
Last modified: 6 May 2012, 16:21:02 UTC

CASP has one side of things for automated structure prediction, but there is another side for human predictions (many of which are done with various degrees of automated tooling behind them).

CASP is for the world's scientific community. All researchers studying the subject are potential participants. It is not truly a competition. It is more done as a measure of the current state-of-the-art. At the end, the "winners" present to the others about their methods and ideas that they attribute for their superior predictions. So as the ideas are assimilated into the various other approaches and combined, the next round is always a greater challenge than the last.

When reference is made to Rosetta vs. the rest, it is just people rooting for their "home team" to do well and continue to demonstrate that science is progressing and that we're truly learning how proteins work. This leads to vaccines and treatments for many of life's diseases.

When I started with Rosetta@home, the project was graciously making good use of 15 TeraFlops of computing power. Dr. Baker said it had taken off beyond all expectation (of a campus-wide distributed system). He also said that ten times more computing power would really open new frontiers to the research he and his lab have dedicated their lives to. It is truly rewarding to now see the project churning at over 150 TFlops with people from all over the planet contributing to a common cause for the common good. Keep crunching!
Rosetta Moderator: Mod.Sense
ID: 73005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Rosetta@home Science : CASP 10



©2024 University of Washington
https://www.bakerlab.org