Posts by Mod.Zilla

1) Message boards : Rosetta@home Science : DISCUSSION of Rosetta Insight (Message 74511)
Posted 22 Nov 2012 by Mod.Zilla
Post:
I asked Prof. Baker a question a few years ago that he answered. The question might be relevant and have a different answer now, however. So let me suggest you ask him to see his current answer. The question is simple: How much has Rosetta improved over, say, the last 5 years? I am unsure how you would measure improvement, so I am willing to go by whatever metric Prof. Baker might choose to use, be in quantitative or qualitative in nature.

Good question! I'll add that to my list.

Also, I'll mention the twitter account.
2) Message boards : Rosetta@home Science : DISCUSSION of Rosetta Insight (Message 74488)
Posted 20 Nov 2012 by Mod.Zilla
Post:
Please post any questions or comments about the "Rosetta Insight" thread here so that we can keep that thread tidy.

Thanks!
3) Message boards : Rosetta@home Science : Rosetta Insight (Message 74487)
Posted 20 Nov 2012 by Mod.Zilla
Post:
First conversation – post number 1

Bakerlab Papers
I must admit that until this conversation I didn’t realise that all of the papers that the lab produce are available on the website – I have often come up against paywalls when trying to read about things mentioned in posts by members of the lab. There are some really interesting papers on there; although they’re generally very technical, they’re definitely worth a look if you want to understand what your contributions to R@H are used for, even if you only read the abstract.
I can now recommend ‘Computational Design of Self-Assembling Protein Nanomaterials with Atomic Level Accuracy’!

If you just want assurance that your contributions are being put to good use, having read a number of these I can confirm that they certainly are!

Some things I did not know about the Bakerlab:

  • There are around 50 people working within the Bakerlab, with a roughly 50-50 split between PhD students and post-doc. researchers.
  • It is a diverse group, with a lot of the researchers coming from outside of the US which in-turn is resulting in those researchers seeding new labs with ties to (and experience from) the Bakerlab.


Current research areas
As mentioned in the intro to this blog, the Rosetta software has two functions: protein modelling and protein design, so unsurprisingly, that’s what the members of the lab are working on! Improvements to the software are developed through trying to improve the software’s ability to work with particular proteins that the individuals within the lab are working on. Put another way, Rosetta is developed through being used in research and improved/expanded where necessary.

Incorporating experimental data
Some members of the lab are working on allowing experimental data to be incorporated into Rosetta, such as NMR data which might show that some amino acids are close together, and which can therefore massively reduce the number of configurations that the folded protein can take.

CASP10 (website)
CASP is a competition held annually for the protein modelling community to pit their techniques against each other. It is performed blind – the entrants have to determine the shape of many different proteins from their amino acid sequence. The structure of all of the proteins in the competition has been determined experimentally, but not yet released to the scientific community, and the competition is therefore a blind search – the labs have no way to tell whether their model is correct until the results are released.

We’re currently in the gap between the competition and the results, although I’m not sure if any CASP work is being carried out at the Bakerlab between now and the meeting in Italy in December. The Bakerlab has done very well at CASP in the past and there are few entrants who enter models of as many of the proteins as the Bakerlab do, but I believe last year’s competition showed that those models which incorporated experimental data (specifically, experimental data from NMR) or referenced similar proteins as starting points did well and so that has been an area of research for the lab.

Using experimental data helps the computer model by massively reducing the search-space that Rosetta has to search – anyone who has played Fold-It will understand how useful it is to know that two side chains should be in contact as that dramatically reduces the potential configurations available for the protein. The results for CASP10 will be released around early December, prior to a meeting on 9th-12th December to discuss the techniques used and their strengths and weaknesses.

That covers everything that’s legible in my notes from our first conversation! Going forwards, I’d like to find out more information about the following subjects:


  • Rosetta Commons – the community that has been seeded by people who have worked in or with the Bakerlab and who now develop the Rosetta software from all over the planet.
  • More information about Nobuyasu’s announcement about their computationally designed de-novo proteins and what this might lead to (that article is here).


I have other questions lined up too, but if there’s something you want me to ask about Rosetta@home or the wider Bakerlab then post in the “Discussion of Rosetta Insight" thread and we can add it to the list.

I hope this is useful and interesting – let me know what you think.

4) Message boards : Rosetta@home Science : Rosetta Insight (Message 74486)
Posted 20 Nov 2012 by Mod.Zilla
Post:
Hi All

I’ve recently been asked to help to get useful information out to the R@H community on a reasonably regular basis about what the folks at the Bakerlab are currently working on. I’ll speak to Prof. David Baker on Skype periodically, and then report back to the community here. The aim is to find an appropriate balance between covering the key points of the many different and highly technical things they’re working on while keeping it relevant to the majority of Rosetta@Home contributors. It would also be great (and very useful for me!) to get your questions to ask too. In turn, hopefully this will also help to attract new users to the project so that the team have more compute resources at their disposal to help speed up and extend their research.

I’m not affiliated with the lab or project in any way, other than having run R@H for quite a few years now. I’ll do my best – there’s going to be a steep learning-curve for me so go easy if I make any mistakes (and feel free to correct me – I’ll update and revise where necessary!).

First up, because it’s central to the project, here’s some info on proteins which is what R@H and the Bakerlab are all about. Many of you will know this stuff, and some of you will probably have relevant PhDs, so this is for those who want a quick introduction or recap. Feel free to correct/clarify any of this, or provide any appropriate analogies or links that will help – my Biology degree is a decade-old now and I work in hydro so it might show!:

A bit of background: Proteins

If you don’t really know what proteins are or do, or want a quick recap, here’s a good site and here’s an excellent animation of protein production in action (apparently that one is approximately in real-time).

Basically, proteins are chains of amino acids and they can either be structural or functional (in which case they are called enzymes; essentially they’re tiny machines that perform one or more functions). The sequence of amino acids determines the shape of the protein and therefore its function/structure.

Glossary

  • Gene: a length of DNA (or RNA) which codes for a protein.

  • DNA: the molecule which stores the code for all of the proteins in an organism. The code is stored as the sequence of bases (there are four). There are ‘start’ and ‘stop’ codes which determine the length of a gene. DNA is transcribed/read (by proteins!) and the appropriate sequence of amino acids is produced (there are intermediate steps in-between).

  • Amino acid: a group of 22 molecules from which all proteins are made.

  • Protein: One or more chains of amino acids. These chains fold up in highly complex, but determinable ways, and the folded shape is critical because that determines the function or structure of the protein. Sometimes other elements (co-factors) are incorporated into the folded structure (e.g. iron in the haemoglobin protein).

  • Bakerlab: The lab at the University of Washington in Seattle headed by Prof. David Baker

  • Rosetta software: The protein modelling software suite developed initially in the Bakerlab, but now developed worldwide through Rosetta Commons. Not to be confused with Rosetta@Home, this is the software package that is used by labs and private organisations around the world for a range of protein modelling tasks. For a summary, see here.

  • Rosetta@Home: The Rosetta Software Suite which has been packaged to allow it to run on the BOINC distributed computing platform.
    A bit of background: Rosetta
    The Rosetta software has two related functions:

  • Protein modelling: determining the 3D structure of a protein from the data available, such as its amino acid or DNA sequence.

  • Protein design: designing new proteins to perform specific functions.
    It is used for research (see the lab’s research papers for examples), while also being constantly developed to improve its accuracy and functionality. There’s a great video on Rosetta here.
    The amount of computer power required for protein research can be phenomenal due to the vast number of shapes that even small proteins could (but don’t) take, which makes it an excellent fit for distributed computing.


Without computer modelling, the methods available to determine a protein’s shape are:


    1. X-ray crystallography
    2. Electron microscopy (can determine the protein’s outer shape but cannot see within to see how it is folded)
    3. NMR spectroscopy


These methods are generally expensive (I’ve heard $100,000 USD per protein using XRC although it might vary wildly!) and resource-intensive, and are not necessarily conclusive in their results. Software modelling of proteins therefore has a large role to play by reducing the costs and time required, and potentially improving accuracy too. The holy-grail is therefore for Rosetta to be able to determine the shape of a protein from a DNA or amino acid sequence for any protein. My understanding is that at present it is very good at that for some proteins (again, see the research papers section for evidence), but not all, and improving this is one of the main areas of research.

How can you tell if a model of a protein is right without already knowing the structure and comparing the model against that?
Proteins fold into the lowest energy state that they can – the lower the energy retained within the protein, the more stable it will be. That is because it will then require more energy (heat) to remove it from that state. The amount of energy stored within a model can be calculated from the relative position of the atoms in its folded shape. Essentially, proteins fold into a low energy state, and so that state is what Rosetta is searching for.

Protein Design (i.e. creating new proteins that don’t exist in nature):
As well as modelling natural proteins, Rosetta can also be used to create new ones to perform specific functions. The process is something along the lines of:


    1. Start with the target structure’s stable points (for HIV or influenza that is those points which don’t rapidly mutate or get swapped).
    2. Find an amino acid side-chain that will bind to each of those identified points.
    3. Design a protein back-bone to join those side-chains together into a single protein.
    Once designed, the amino acid sequence of the protein can then be purchased relatively cheaply, and tested to see if it performs as expected in the lab.


Here's a very interesting post by Nobuyasu explaining what protein design work he and his colleagues have been working on.

OK, that's enough background to the project - on to the questions!

5) Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (4) (Message 68392)
Posted 4 Nov 2010 by Mod.Zilla
Post:
THREAD CLOSED!

This seems like a good time to close this growing thread and cut a new one for further discussion. Please place new comments and questions in Discussion 5.
6) Message boards : Number crunching : Discussion of the new credit systen (3) (Message 61306)
Posted 21 May 2009 by Mod.Zilla
Post:
continued from Discussion of the new credit systen (2)
7) Message boards : Number crunching : Account deletion - Not possible, so... read this (Message 59284)
Posted 4 Feb 2009 by Mod.Zilla
Post:
Perhaps revising the thread title will force people to read the basic idea of this thread.
8) Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (5) (Message 56154)
Posted 1 Oct 2008 by Mod.Zilla
Post:
This thread is the fourth of a series where participants can discuss and ask questions about Dr. Baker's journal entries.

To reference discussions prior to this, see Discussion 4.
9) Message boards : Number crunching : BOINC v6.6.20 scheduler issues (Message 56152)
Posted 1 Oct 2008 by Mod.Zilla
Post:
New thread created and posts moved in as requested.
10) Message boards : Rosetta@home Science : DISCUSSION of Rosetta@home Journal (4) (Message 49908)
Posted 21 Dec 2007 by Mod.Zilla
Post:
This thread is the fourth of a series where participants can discuss and ask questions about Dr. Baker's journal entries.

To reference discussions prior to this, see Discussion 3.
11) Message boards : Number crunching : Rosetta Application Version Release Log (Message 49688)
Posted 14 Dec 2007 by Mod.Zilla
Post:
Version 5.89 includes:

  • the ability to model symmetric complexes with new kinds of symmetry (dihedral with several monomers, e.g., D5),
  • a fix for an occasional crash that occured for large symmetric complexes,
  • a new option for regular protein structure prediction that prevents large movements during the full atom refinement,
  • and improvements to the RNA energy function.

12) Message boards : Rosetta@home Science : Rosetta@home home game - Fold.it (Message 49356)
Posted 3 Dec 2007 by Mod.Zilla
Post:
Dr. Baker asked that a list of people interested in beta testing the home game of Rosetta@home be compiled. I think it might work best if we do this via the "Informational Moderator" that we've set up.

If you'd like to participant in the beta, send an EMail to:
rosettaNOmod.SPzilla at yaAMhoo.comERS
(and just remove NOSPAMERS from the above and replace the word "at" with the @ sign for an EMail address).

We will create a distribution list under that Yahoo! account for those involved in the testing so everyone stays informed.

This keeps the EMail addresses and etc. a little less public. So we will plan to distribute information about the beta from that EMail account.

So, just send an EMail with the following subject line: "Please add me to Rosetta Game beta list" and we'll get you an acknowledgement sent so you can easily add that mod.zilla account to your "not SPAM" list or direct any EMails to a specific folder or whatever you like.

--Mod.Sense

A note for other moderators, the password to the EMail account is the same as the moderator account on GMail.
13) Message boards : Number crunching : Why no R@h work downloading? I also run SETI (Message 41205)
Posted 20 May 2007 by Mod.Zilla
Post:
If you update to Rosetta and your message looks like below, where I've colored red

5/19/2007 10:33:19 PM||request_reschedule_cpus: project op
5/19/2007 10:33:24 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
5/19/2007 10:33:24 PM|rosetta@home|Reason: Requested by user
5/19/2007 10:33:24 PM|rosetta@home|Note: not requesting new work or reporting results

it means that your BOINC Manager has decided that it does not need any Rosetta work at the moment to fulfill it's crunching plans for the near future.

BOINC tracks the amount of time spent crunching each project and tracks how this compares to your desired resource shares (as shown in the Projects tab of the advanced view). When no work is available for one project, and it is unable to crunch for that project, it does work for your other project(s). But when this occurs, BOINC keeps track of a "debt". The project that is crunching is incurring a debt to the one that has no work. When work becomes available on the other project, then BOINC will crunch a lot of it to make up the debt.

For example, if you run SETI and Rosetta and have a resource share of 2/3rds to SETI and 1/3rd to Rosetta, and you have been crunching only Rosetta for a week, due to SETI outage... you will need to crunch nothing but SETI for about 2 weeks to get back in to balance. i.e. to pay back the debt. After that time, BOINC will resume a mixture of work for each project and begin requesting Rosetta work again.

If you have more then two projects, the concepts are the same. Although with a mixture of deadlines and runtimes, it will be more complex to predict exactly when things will come back in to balance. Just rest assured that BOINC is keeping track of it all, and trying to keep the resource shares and other configuration parameters you have specified.

As always, if you would LIKE you crunch more Rosetta, you can increase your resource allocation in your Rosetta Preferences. But you will still have the "debt" to pay back to SETI first, if you increase your Rosetta share, the debt will be balanced sooner then otherwise.
14) Message boards : Rosetta@home Science : Rosetta@home Active WorkUnit(s) Log (Message 38063)
Posted 20 Mar 2007 by Mod.Zilla
Post:
More details from Chu on Protein-protein docking and CAPRI.

More details from Rhiju on RNA folding
15) Message boards : Number crunching : All about Rosetta memory requirements (Message 37558)
Posted 7 Mar 2007 by Mod.Zilla
Post:
I (Mod.Sense) wanted to start a thread just for the recent memory issues. There are many. I'm going to start with what I know and try to address as many of the related questions as possible. This information will likely go in to the FAQs once we hash out the topic in this thread.

Please post any questions you have which I haven't addressed below.
===================================

Q1: Why am I getting no work? The message says my computer doesn't have enough memory, but I've checked the system requirements page, and my system meets the requirements.

A1: Your system may meet the 256MB minimum requirement, but Rosetta often creates work units that require more memory to run. These involve longer protein strands, and/or more then one protein with DOCking and HINGE tasks.

At present, the BOINC servers do not have a smooth method for Rosetta to organize the work and isolate work units for systems with only the minimum memory from those which require more. The result is that there are times when the server is unable to locate any work units for minimum memory systems.

There are several approaches to addressing this so you can continue to crunch Rosetta. One is for the Project Team to try and keep a mixture of both types of work units available on the server to improve the chances that work is available for minimum memory systems. The Project Team has already taken action to do this.

Another approach is to keep a larger cache of work ready to run on your machine. This is done in the General Preferences with the setting for "connect to network about every ... days". Keeping a larger cache helps assure you have work to crunch, even during server outages, a lack of appropriate work units, or problems with your internet connection. It also gives you more lead time prior to downloading a new Rosetta version, which may let you schedule that to a time of your chosing, rather then when you go to get your next task(s).

A third approach is to attach to other BOINC projects with a small resource share, so your computer can work on other projects during any period when you are unable to get Rosetta work that is appropriate for your machine. There are several protein and medically related projects to chose from. Some even use the Rosetta program being developed and enhanced with your help here at Rosetta@home.
===================================

Q2: Do the large memory work units take longer to run? How does this effect my work unit runtime preference?

A2: It doesn't effect the runtime preference you have established in your Rosetta Preferences. The same rules pertain as to how long to crunch the task, and that you must crunch at least one model. One slower systems, some types of tasks may take longer then the 3 hour default runtime preference to complete that first model. BOINC will continue running the work unit, but it will be a little confused about predicting how much longer it will require to complete. This is because the estimated completion time is really only recomputed at the end of a model, and your machine hasn't reached one yet.
===================================

Q3: My machine runs Linux and can get done in 256MB what takes Windows twice as much. Why can't my 256MB Linux machine get the tasks that require the large memory Windows machines?

A3: You may be correct, but BOINC doesn't work that way. The memory requirement designated on a task pertains to all platforms that might download it. The BOINC server programs do not support designating a different memory requirement for each platform.
===================================

Q4: Rosetta is losing my machines because I don't feel it is fair for them to require more memory.

A4: We certainly don't want to lose your support. In fact, that is the reason we've destinguished two different task sizes. We want to assure that running Rosetta will not interfere with your use of your computer, and not overly work the disk drive.

If we did not seperate the high memory tasks, then systems without enough memory would be overly worked in attempting to run the tasks. The only way then to assure smooth operation for all Rosetta tasks would be to increase the minimum memory for the entire project. So creating two distinct types of tasks, based on the memory requirements observed in test runs is a much better way to assure more people can participate, and do so with minimal disruption to their computer use.

By creating two types of tasks, we're able to keep the minimum memory requirement as low as possible, and yet still pursue the advanced science projects which require more memory to run well.
===================================

Q5: Why don't they fix the Rosetta program so it uses less memory?

A5: Changes have already been made to reduce the amount of memory used. And it is possible that further changes can be made in the future as well. But we need to get the science done first and see if our new approaches are producing the desired results before we go too deep in to optimizing how the new science routines run.
===================================

Q6: But I do have 512MB of memory, why am I not able to get the large memory tasks?

A6: There are a couple of things to check. One is whether some of your memory is used by your graphics adapter (can someone post details on how to check this? I'll incorporate them here once I get them).

The other thing to check is your BOINC General Preferences. There are two settings that pertain to physical memory, the percentage you want BOINC to use when the computer is in use, and the maximum percetage you want BOINC to use when the computer is idle.

If your settings only allow BOINC to use 90% of your physical memory, then if your machine is 512MB, you are allowing BOINC to use up to 460MB. The HINGE tasks require slightly more then the 460MB result. If you'd like to run them, then BOINC will need your preferences to reflect that you are willing to allow a higher percentage of memory, at least when the computer is idle.
===================================

Q7: Why doesn't BOINC just use 100% of memory when the computer is idle? What's the point of configuring less?

A7: If you want to leave some room on your machine for your other applications to remain in memory, you might set the idle memory usage to less then 100%. This will help your response time when you go back to using your computer.

If you find that your computer is too slow to activate again once you've been away for a while, you may want to gradually reduce your setting for the percentage of memory to allow BOINC to use while the computer is idle. Reduce it until you've feel your struck a good compromise between getting more BOINC work done, and having your computer ready when you are to get your other work done. This setting is in your General Preferences.
===================================

Q8: But my machine has been running fine for months, why the problem now? Why don't they just fix it?

A8: These memory settings have been in the preferences for some time. They define which tasks your machine will download. But only after upgrading to the 5.8 BOINC clients did these limits actually get enforced as the tasks are running. So, while you may not have changed your settings, if you upgraded to a 5.8 version of BOINC, you went from a version that did not enforce these preferences, to a version that does enforce them. This may be why you see tasks in a status of "waiting for memory" (see Q13 below), when you didn't previously. And sometimes only see one CPU active (see Q12 below).

The other recent change that coincided with the BOINC changes is that the Project Team has been working on a lot of advanced science lately and studying more very large proteins. So, more of the large memory tasks are being created then in the past.
===================================

Q9: Is this a trend towards the large memory tasks?

A9: No. The project will have tasks of both sizes going forward, and do what they can to help assure that minimum memory systems can find minimum memory tasks to download.
===================================

Q10: Are you telling me I have to buy more memory if I want to continue to support Rosetta?

A10: Certainly not. It is up to you how you wish to use your computer. And as discussed above, additional memory beyond the 256MB minimum is not a requirement.
===================================

<added March 7>
Q11: I've watched my machine's memory usage as the normal memory work units run, and they do not use the 256MB that is stated as their requirement. Why state a 256MB requirement when the tasks do not take that much memory to run?

A11: Since each task is exploring an unknown landscape, it is not possible to determine ahead of time exactly how much memory or CPU time will be required to complete a specific model. The 256MB (and higher) "requirement" is in place to allow for the exceptional case where a model may take more memory then is typical for that type of work unit.
===================================

Q12: I've revised my settings to allow BOINC to use more memory and now received two of the HINGE work units. But now BOINC will not run both tasks at the same time. Why not?

A12: BOINC is actually monitoring the memory usage of the tasks as they run. When a Rosetta task starts, it uses a small amount of memory while it initializes the work unit and the first model. Then, as the model computes, memory is used and freed up as the computation progresses. Towards the end of a model, the task is typically using significantly more memory then at the beginning.

If your computer has multiple CPUs, or is hyperthreaded (HT), BOINC will normally run a task on each CPU. The number of CPUs BOINC should use can be controlled via your General Preferences. However, when BOINC notices the configured memory settings are exceeded, it will pause one of the tasks until enough memory is available.

Some of the HINGE tasks can consume nearly 400MB of memory as they run. To run two of them requires twice as much memory for that phase of the computation. So, even if your machine has 512MB of memory and you allow 100% of it to be used by BOINC, you can still run in to combinations of work units that are unable to keep all CPUs busy at all times. The tasks will complete, but only when the memory usage of the first task is reduced (which might occur as it starts a new model), or when the computer is idle, if your configuration allows BOINC to use more memory during idle time.
===================================

Q13: Why does BOINC start on one task, then interrupt it saying it is "waiting for memory" and start another?

A13: When BOINC finds itself exceeding your configured memory settings, it will stop work. If your computer has multiple CPUs, it may only have to stop one of the tasks to fall back to within your configured memory guidelines. If another task is available, BOINC begins on that one. If that second task uses less memory, then perhaps both could run within your configured memory usage guidelines. BOINC has no way to know, until it starts running it, how successful it will be in that effort. If it then sees this new task taking more memory, which may take a minute or two of runtime, then it will hold it as well, until memory becomes available (i.e. until another BOINC process frees up memory).
===================================

Q14: Why has my RAC declined since these HINGE tasks started running on Rosetta?

A14: The potential reasons for RAC decline are too numerous to detail all in one place. But the above memory issues may be one cause. Your RAC is based on the rate at which you return completed work, and the credit you receive for that work. If your credit awarded as compared to your credit claimed is in about the same ratio as it has been with other work units, then the RAC decline is likely due to:
1) lack of work for minimum memory machines, as described in Q1
2) not enough memory to keep all CPUs busy, as described in Q12
3) BOINC memory use limited, as described in Q6
===================================

Q15: So why can't they just fix it to use the memory it needs to without all this fussing around with settings?

A15: They did! The BOINC defaults help assure smooth operation. All of the above is just an explaination of how BOINC works, and how you can control it to work differently. Configuring things and changing settings are just options available for those that want to crunch even more.
===================================
16) Message boards : Number crunching : Credit system not fair (Message 36750)
Posted 13 Feb 2007 by Mod.Zilla
Post:
The original post was removed due to swearing. Here is the rest of the post by: schatten1411
Sorry, but i can't understand that. An FX60 with 2800MHz will makes more points (or CPU-power) than my X2-6000 with 3000MHz. Result: over 600pts to my 160 per day !!!! Others will tell me more then 1000 PC's their own !!!!

Thats !@#$%^&

Tell me if u've an fair point-system (give points for steps NOT for time), an fair stat-system that distinguish between privat PC's from organisations.

Now i use my CPU for other projekts.

17) Message boards : Cafe Rosetta : FAQ - Handling error messages (Read Only) (Message 36749)
Posted 13 Feb 2007 by Mod.Zilla
Post:
.
18) Message boards : Cafe Rosetta : FAQ - About Teaming (Read Only) (Message 36742)
Posted 13 Feb 2007 by Mod.Zilla
Post:
.
19) Message boards : Cafe Rosetta : FAQ - About Teaming (Read Only) (Message 36727)
Posted 13 Feb 2007 by Mod.Zilla
Post:
.
20) Message boards : Cafe Rosetta : FAQ - About Teaming (Read Only) (Message 36718)
Posted 13 Feb 2007 by Mod.Zilla
Post:
.


Next 20



©2024 University of Washington
https://www.bakerlab.org