Question regarding number of WU's and how they are created.

Message boards : Number crunching : Question regarding number of WU's and how they are created.

To post messages, you must log in.

AuthorMessage
William_AZ

Send message
Joined: 7 Dec 08
Posts: 2
Credit: 47,504
RAC: 0
Message 57836 - Posted: 13 Dec 2008, 1:14:09 UTC

I have a question out of curiosity.

I tried to google it, and have read through some forum threads, but yet to find the answer.

I am wondering (ballpark figure) how many total work units there are, and basically how they are created.

There are thousands upon thousands of active machines performing the "crunching." I am just curious how there could be so many work units created to keep all of the machines busy.

I do realize that the problems in the work units are complex, and require a great deal of computing power to solve.
Is each work unit something that is created individually (manually) by project members, or is there just a basic formula for computation that is applied to a range of numbers? (i.e. are the hundreds of thousands of work units computer generated based on a range of numbers, and a computer creates the workunit by generating a sequential series of x+y2 problems, then generates work units via an automated process?)

Just curious.

Thanks and regards.

ID: 57836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 57837 - Posted: 13 Dec 2008, 2:30:35 UTC - in response to Message 57836.  

I am wondering (ballpark figure) how many total work units there are,

On the right hand side of the Rosetta home page there are some stats including "Successes last 24h", which currently is 225,051 WUs.
ID: 57837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 57839 - Posted: 13 Dec 2008, 4:21:13 UTC
Last modified: 13 Dec 2008, 4:31:20 UTC

You've asked a lot of questions there. Let me try and cover the basics.

Why so much computing power is needed & how many work units are there?
The number of amino acids that make up a protein basically defines how difficult it will be to study. A protein with only 50 amino acids will be much easier to study then one with 200.

For each amino acid, there are conservatively 3 different ways it might be connected to the next. So let's talk about a hypothetical 100 amino acid protein. And let's open the calculator on your desktop and follow along.

The first amino acid can bind to the second in three ways, so the two have three possibilities to study. For each of those, the second has three ways to bind to the third. So, for all 3 together there are nine possibilities to consider. Adding the fourth to the picture raises three more possible contortions for each of the 9 preceeding possibilities, thats 27... keep going for all 100 amino acids of the protein and you get 3 to the 100th power (You may have to go the scientific view of the calculator to do that). Not 100 to the third power. 3 to the 100th power.

So, this 100 amino acid protein can conservatively be estimated to have
5.1537752073201133103646112976562e+47 possible structures. That's a 5 followed by 47 more digits! So, let's say for argument sake that you can assess each possible structure with just a microsecond of computer time. So dividing the large number by 1,000,000 tells us how many seconds it will take. Dividing by 60 then tells us how many minutes. Dividing by 60 again tells us how many hours. Dividing by 24 tell us how many days. Dividing by 365 tells us how many years it will take. ...my calculator is a bit hard to read through all the scortch marks, but I think it says it will take 1.6342513975520399893342882095561e+34 years!

...that was just for one protein. There are 10s of thousands more!


How the work units are created.
Given the above, we're not going to be able to test each and every possible structure to see how it scores. So, instead, we that more of a shotgun approach. Start with a random assortment of the possible connections between amino acids, and then try shifting, stretching, and settling in from there to get the best score possible with that random starting point. Give each computer a unique starting point (not hard with trillions of possibilities) and see which one finds the best score. If a given computer has established a runtime preference (see Rosetta preferences) that allows more then one model to be studied, a second (and third...) starting point is used and the stretching and settling begins again. The best score for each model is returned when the task is completed. The scores of everyone's tasks are compared to find the best score. The best score is likely the closest to the actual shape of the protein.

With such a large search space, and numerous approaches to doing the shifting and settling, a variety of types of tasks are created. For each type of task, many units are created to create a significantly large number of models to decide if more models should be created with this type of solution, or if one of the other types might be more promising.

The process for determining which approaches might tend to be the best to use is just now starting to be automated.

Sometimes one approach feeds in to another. Think of one wave of tasks as being a survey crew. Then, based on their findings, another wave of tasks is created to exploit what was found. This second wave will be much larger, doing much more detailed analysis.

With so many models possible, how will we ever find the right one?

That is what makes the problem so difficult. Basically, you have to learn how to eliminate as many of the possible models as you can from your search. If you were searching possible next moves in a chess game for example, you might stop looking after a move that loses your queen. Studies of chess show the queen to be a valuable piece, and if you lose it, without also capturing your opponent's queen, you are very likely to lose the game... even if it takes a long time. So, you stop studying that sequence of moves and focus your attention on alternatives with a more likely better outcome.

So the trick is to figure out where the "lost queens" are in the search space, and prune down the number of conformations you are trying to focus your attention on. Or, the other approach, would be to figure out where the "checkmate" in your search space is, and work towards that. Developing these heuristics is one of many things Rosetta@home is working towards.
Rosetta Moderator: Mod.Sense
ID: 57839 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,594
Message 57853 - Posted: 13 Dec 2008, 17:01:10 UTC - in response to Message 57839.  

Thank you for this post. It really helps to explain the problem and why so many computers are required to work on the problem.
Thx!

Paul

ID: 57853 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William_AZ

Send message
Joined: 7 Dec 08
Posts: 2
Credit: 47,504
RAC: 0
Message 57858 - Posted: 13 Dec 2008, 20:15:21 UTC

Thank you for you detailed reply to my "Newbie" question. I appreciate your time. I just wanted to understand the process better.

I am amazed by how many work units are involved, and even more amazed that people have found a way to work on the solution with "crunching."

I wish I had more computers to donate to the cause.

Regards.

ID: 57858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 7,348
Message 57859 - Posted: 14 Dec 2008, 11:33:50 UTC - in response to Message 57858.  

Thank you for you detailed reply to my "Newbie" question. I appreciate your time. I just wanted to understand the process better.

I am amazed by how many work units are involved, and even more amazed that people have found a way to work on the solution with "crunching."

I wish I had more computers to donate to the cause.
Regards.


Hey you are starting with more than most and have the potential to grow into a full fledged "rancher"! There are several levels of crunchers, a gardener...less than 6 machines, a farmer...more than 5 but less than 11 machines and a rancher...more than 10 machines. I have gotten to the rancher status over many years of crunching. I started with one machine on Seti back in 1999. Over the years I have added and taken away machines but am now fairly stable at more than 15 but less than about 20.
Good luck and just keep on crunching!
ID: 57859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 65,754,624
RAC: 1,594
Message 57860 - Posted: 14 Dec 2008, 11:44:45 UTC - in response to Message 57859.  

Thank you for explaining these levels. After several years of crunching I was unaware of these levels. I keep finding machines that will run BOINC and toss them into my farm. It looks like 11 are now in the farm.

The goal is to keep adding systems. Maybe after Christmas people will have more hand-me-downs for the farm. Electricity and cooling have become real issues.

Farming and ranching sound like great occupations.

BTW - how do people monitor so many systems? Typically I just watch my RAC and when it begins to fall I start to manually check all the systems to see who requires a reboot. Is there an easy way to monitor BOINC on lots of systems?
Thx!

Paul

ID: 57860 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,767,285
RAC: 7,348
Message 57963 - Posted: 17 Dec 2008, 10:21:05 UTC - in response to Message 57860.  
Last modified: 17 Dec 2008, 10:25:47 UTC

Thank you for explaining these levels. After several years of crunching I was unaware of these levels. I keep finding machines that will run BOINC and toss them into my farm. It looks like 11 are now in the farm.

The goal is to keep adding systems. Maybe after Christmas people will have more hand-me-downs for the farm. Electricity and cooling have become real issues.

Farming and ranching sound like great occupations.

BTW - how do people monitor so many systems? Typically I just watch my RAC and when it begins to fall I start to manually check all the systems to see who requires a reboot. Is there an easy way to monitor BOINC on lots of systems?


I use a small program call Radmin, it is a remote admin program that you have to load onto each pc and then you can sit at one pc and log into each and it is like you are using their desktop. Some people use a Boinc monitoring program called BoincView. Here is a link to a page with lots of Boinc add-ons http://boinc.berkeley.edu/addons.php
I never had good luck with BoincView. There are other Radmin type programs, VNC for example. Radmin is available here http://www.radmin.com/. The price has gone thru the roof so you might want to look elsewhere. Vnc is available here http://www.realvnc.com/ I thought it had a free version, and it does somewhere, I just can't find it right now. I found it
http://www.download.com/VNC-Free-Edition/3000-7240_4-10045255.html
ID: 57963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AlphaLaser

Send message
Joined: 19 Aug 06
Posts: 52
Credit: 3,327,939
RAC: 0
Message 58021 - Posted: 19 Dec 2008, 2:25:01 UTC

I personally use BoincView for monitoring clients as well as TightVNC for remote control.
ID: 58021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Question regarding number of WU's and how they are created.



©2024 University of Washington
https://www.bakerlab.org