Tells us your thoughts on granting credit for large protein, long-running tasks

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next

AuthorMessage
spRocket
Avatar

Send message
Joined: 23 Mar 20
Posts: 22
Credit: 3,008,018
RAC: 0
Message 95135 - Posted: 22 Apr 2020, 15:05:03 UTC

I think I've picked up a 4 GB work unit on one of my systems - it has 8 GB RAM, but at the moment, only a single task is running, and I haven't touched its settings. The "top" command shows a resident size of 2.874GB.
ID: 95135 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,543,381
RAC: 5,926
Message 95138 - Posted: 22 Apr 2020, 15:59:22 UTC - in response to Message 95123.  

We only have a few top notch 32+ cores machines with beefy GPUs around the world,
Try hundreds of thousands, at the least.

Yeap, see here
ID: 95138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95140 - Posted: 22 Apr 2020, 16:27:28 UTC - in response to Message 95123.  
Last modified: 22 Apr 2020, 16:37:26 UTC

Yes, I agree that we should not crunch on everything. I meant to say on every computer where it is worth it, as per my thread "The most efficient cruncher rig possible". Sorry this part of the sentence got lost - I had to retype this message because no drafts are saved on this forum.

We should do exact computations on this, but my gut feeling is that crunching on normal, non-extreme, non-server hardware can be at least somewhat efficient if it is:
    - more recent than 5 years
    - more recent than 10 years underclocked
    - more recent than 10 years portable



You could actually produce a histogram/median/average of our current fleet from this data: https://boinc.bakerlab.org/rosetta/cpu_list.php

Although I think the machine distribution is quite skewed towards the higher end compared to the general population, so it shouldn't be considered representative. Also note that computers are usually recycled after about 15-20 years of age in general, so you shouldn't see a large number of them in operation anyway.

By "only a few top notch computers" I meant that I expect them to be much less than 1% of the population according to my gut feeling.

Also, I expect that most deployed high performance computers already serve a purpose and usually couldn't offer their unused capacity for volunteer computing, as a given company made a big investment to purchase and operate them. On the other hand, there exist a vast amount of computers just sitting there all day long in businesses, schools and homes. If we assume that we are only talking about the more efficient crunchers, the benefit of their computation should far outweigh their cost in electricity.

And if you are not running 24/7 but are running BOINC in the background with low priority, it still has higher energy efficiency due to the components that are shared between a given project and the user. For example, if a user's machine idles at 30W, then the +60W CPU power cost would be less than operating a dedicated cruscher at 90W either in their own home, or in a separate lab. Thus reducing global warming, and also producing less electric waste (less servers to manufacture - less of them to dispose of).

ID: 95140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile allen

Send message
Joined: 14 Apr 20
Posts: 1
Credit: 61,472
RAC: 0
Message 95141 - Posted: 22 Apr 2020, 17:36:45 UTC

Hello all:

I'm new here and am wondering how Rosetta determines the amount of wu's to send each computer. The reason I ask is because I have had wu's cancelled before they are finished since they ran out of time.

I have a system that is receiving 8 hour wu's that are continuously taking over 24 hours to run.

Hopefully one of you will fill me in on what's happening here.

Thanks a bunch,

Allen
ID: 95141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95142 - Posted: 22 Apr 2020, 17:39:30 UTC - in response to Message 95140.  
Last modified: 22 Apr 2020, 17:39:45 UTC

I've processed the CPU list table from the above post. Because the sum is much less than the one on the homepage, I think this may include any registered member on the project, not only the active members. Also note that HT CPU's are overrated at least 50% in the total stats (they simply multiply thread count by per-thread flops). As HT is much more prominent at the high end than the low end (envision Celerons/Pentiums), this skews the stats even more towards the right.
21428.9 TFlops;97.8928 GFlops/host mean;218902 host
20.34 GFlops/host median
  64915 < 5 GFlops
  10834 < 10 GFlops
  19593 < 15 GFlops
  12726 < 20 GFlops
  11298 < 25 GFlops
  10666 < 30 GFlops
   8273 < 35 GFlops
   5766 < 40 GFlops
   1993 < 45 GFlops
   2626 < 50 GFlops
   1451 < 55 GFlops
   3696 < 60 GFlops
   2406 < 65 GFlops
   1783 < 70 GFlops
   1363 < 75 GFlops
   1437 < 80 GFlops
   2547 < 85 GFlops
   1959 < 90 GFlops
   4437 < 95 GFlops
    198 < 100 GFlops
    332 < 105 GFlops
    133 < 110 GFlops
     22 < 115 GFlops
    298 < 120 GFlops
  28904 < 125 GFlops
    102 < 135 GFlops
    452 < 140 GFlops
    404 < 145 GFlops
    228 < 150 GFlops
     22 < 160 GFlops
    355 < 165 GFlops
     14 < 175 GFlops
     15 < 180 GFlops
     23 < 195 GFlops
     21 < 200 GFlops
     20 < 205 GFlops
     20 < 210 GFlops
     11 < 215 GFlops
     19 < 220 GFlops
     16 < 225 GFlops
    174 < 245 GFlops
    126 < 250 GFlops
     12 < 275 GFlops
     19 < 290 GFlops
     30 < 315 GFlops
     55 < 335 GFlops
    135 < 380 GFlops
     14 < 405 GFlops
     11 < 630 GFlops
     47 < 645 GFlops
  16686 < 830 GFlops
ID: 95142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95143 - Posted: 22 Apr 2020, 17:46:12 UTC - in response to Message 95141.  
Last modified: 22 Apr 2020, 17:48:53 UTC

I think your question is off-topic here, but let me give a TL;DR.

I can see under your account that you have dozens of in progress WU's. Please visit computing preferences under your account and reduce your store at least ... and store up to additional ... values. They should probably sum to be less than 1 day, even down to 0.1+0.1days during debugging while BOINC is learning your processing rate.

According to this task, it indeed took 24 hours of CPU to complete 195 decoys:
https://boinc.bakerlab.org/rosetta/result.php?resultid=1153332354
Please double check the target CPU runtime in your Rosetta@home preferences under your account. It defaults to 8 hours, although 24 hours should be still doable. Deadlines are around 3 days I think.
ID: 95143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1675
Credit: 17,697,137
RAC: 20,072
Message 95159 - Posted: 22 Apr 2020, 22:32:24 UTC - in response to Message 95141.  
Last modified: 22 Apr 2020, 22:51:22 UTC

I'm new here and am wondering how Rosetta determines the amount of wu's to send each computer. The reason I ask is because I have had wu's cancelled before they are finished since they ran out of time.
It's pretty much the same for all projects- they send a rough Estimate of how long it thinks it will take your system to return work.
But since you're new to the project, it doesn't have any history for work done, and so that estimate can be way off.

Since you are running more than 1 project, you would be much better off with no cache at all. At the very least, an extremely small one.
On the top of this page, click on your name at the top right, then in your Account, under Preferences, When and how BOINC uses your computer, click on "Computing preferences."
Down the bottom is a link to Edit.
Computing
   Usage limits	
                                   Use at most 100% of the CPUs
                                   Use at most 100% of CPU time

   When to suspend	
           Suspend when computer is on battery (not selected)
               Suspend when computer is in use (not selected)
 Suspend GPU computing when computer is in use (not selected)
   'In use' means mouse/keyboard input in last 3 minutes
  Suspend when no mouse/keyboard input in last --- minutes
     Suspend when non-BOINC CPU usage is above --- %
                          Compute only between ---

   Other	
                                Store at least 0.1 days of work
                     Store up to an additional 0.02 days of work
                    Switch between tasks every 60 minutes
     Request tasks to checkpoint at most every 60 seconds

   Disk
                              Use no more than 20 GB
                                Leave at least 2 GB free
                              Use no more than 60 % of total

   Memory
          When computer is in use, use at most 95 %
      When computer is not in use, use at most 95 %
 Leave non-GPU tasks in memory while suspended (not selected)
                   Page/swap file: use at most 75 %
Click on "Update changes."
In the BOINC Manager, View, Advanced. Select Rosetta in the Project tab, then update. those changes will then take effect.

See how those settings go, particularly the Other settings.



I have a system that is receiving 8 hour wu's that are continuously taking over 24 hours to run.
In your account, Preferences for this project click on "Rosetta@home preferences"
Set the Target CPU run time to "not selected"
and Update to save them.

That way it will use the default which is presently 8 hours*. Any currently running tasks will use the old value, any non-running Tasks will use the new value when they start (once the Manager has contacted the Scheduler, or you have pressed Update in the Manager).


* Some Tasks will run longer than their Target CPU Runtime. They are able to run for up to 10 more hours, after which time the Wacthdog timer will end the Task.
Grant
Darwin NT
ID: 95159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,935,266
RAC: 773
Message 95330 - Posted: 24 Apr 2020, 22:46:38 UTC

I use a lot of BOINC projects. PrimeGrid applies a bonus for long-running tasks because most people like short-running tasks. For example, looking at CPU-only tasks:

Subprojects with a 10% long job credit bonus have recent average CPU time of 41:29:00 and 60:40:12 hours
Subprojects with a 20% long job credit bonus have a recent average CPU time of 107[/list]:29:32 and 125:37:06 hours
Other subprojects with longer run-times have long job and conjecture bonuses.

To see details, create a PrimeGrid account and choose Your Account > PrimeGrid Preferences. Or send me a message and ask for a text/screen cap. The preferences also show completion times.

I used to choose projects in part by measuring the points per CPU hour to find those with a high reward. Now I am concerned about medical science more than points.
ID: 95330 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RME

Send message
Joined: 4 Mar 20
Posts: 12
Credit: 1,211,010
RAC: 0
Message 95339 - Posted: 25 Apr 2020, 8:42:02 UTC - in response to Message 95330.  

I can't wait to get to 1,000,000 points so I can get my reward.
ID: 95339 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
teacup_DPC

Send message
Joined: 3 Apr 20
Posts: 6
Credit: 2,744,282
RAC: 0
Message 95343 - Posted: 25 Apr 2020, 11:17:00 UTC - in response to Message 95123.  

but if we contributed every phone, tablet and low-mid end office machine, typically with 2-4 cores, our computing capacity could increase by orders of magnitude. (I.e., we have way less than a million hosts and there exist billions of personal computing devices in the world)
For as many of of those devices there are, many are of such low capability they are of no use to many projects.
And for those that are of use, their frequent use for what they were designed for by the users means they often can't contribute much during those periods, compared to more capable systems.
Just to nuance this, I know people getting their old phones from below a layer of dust out of the chest of drawers and setting them to work. As I've understood they only can be functional with their display turned off, so I doubt if that very phone is available for normal use at all.

And you need to keep in mind efficiency isn't actually about low peak or maximum power use- it is about energy used over time to complete a task.
It's no good having a device use 1W if it takes 1 month to produce a result when something that uses 1kW can produce the same result in a matter of seconds. Yeah, it's instantaneous power consumption is a lot higher. But it uses less energy to do the same work. And the fact it can do so much more work over the same period of time as the slower device makes it even more useful to a project.
I read your point, and it sounds logical, but that coin has two sides. Phone hardware is tailored as well to be super efficient, while continuously needs to be on battery use. Desktop hardware does not necessarily has this efficiency pedigree, though large steps have been made miniaturizing the processor circuits. This phone sideline is a bit off topic perhaps, I admit. But your remark made me a bit curious, I need to search somewhere an GFLOP/W ratio or so. Maybe you're completely right after all, I only caught myself on the thought I was not able to quantify your argumentation. I think an interesting topic in itself.

But no need marginalize our beloved Behemoth machines. I am always impressed what their work throughput is in my team (Dutch Power Cows), saliva dripping from the corners of my mouth looking at those numbers. My older i5 and i7 processors stand their ground, but they are from another order. Independent from this 4GB discussion my next processor becomes a big Ryzen, that's for sure. Behemoths and more potent desktops will always remain a pillar in the capacity of distributed computing.

Rosetta is stretching herself by trying to meet the phone clients and the potent desktop client with those 4GB jobs. If support of phones proves to be a long term investment time needs to learn, but there are a lot of (old) phones out there, and they represent a huge capacity. That is tried to harvest this I can fully understand.

(sorry, a bit off topic i fear)
ID: 95343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 87
Credit: 14,962,011
RAC: 51,302
Message 95345 - Posted: 25 Apr 2020, 12:22:08 UTC - in response to Message 94950.  

I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.
That's the thinking.
Over all, the effect should be neutral. People shouldn't lose out for processing these larger RAM requirement Tasks, and they shouldn't get a boost either. All the work is important, so if a Tasks stops 2 or more others form being processed at that time, it needs to offset that loss in production.

Credits can't buy you a toaster, but they can let you see how you are doing, and how much you have done to help Rosetta.


+1
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 95345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95348 - Posted: 25 Apr 2020, 13:27:12 UTC - in response to Message 95343.  
Last modified: 25 Apr 2020, 13:32:25 UTC

I've already answered some of your questions above regarding efficiency and whatnot:
- https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13833&postid=95140

If battery use is an issue for you, see also:


You can actually compute an approximate performance/watt quite easily from the CPU list shared earlier in this thread and some Wikipedia or datasheet lookups for power consumption.

BOINC can run on many Android phones in the background regardless of whether you are using it or not. An aging phone from many years ago can still crank out as much RAC as a Raspberry Pi 4. It is usually set up so it only computes when it is on charger and having finished the charging cycle during the night. At the same time, phones with iOS can only compute with DreamLab while the screen is on.

As you've rightly noted that a PC is more universal and supports more projects. Although an SBC can be more power efficient credit/watt or credit/$ and could take up less space, but you will need to maintain more nodes. With the right tools and experience, this shouldn't be an issue, but you should keep this in mind.

So although we may not be able to declare a clear winner, it's good to be aware of all the options.

ID: 95348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ged

Send message
Joined: 17 Apr 06
Posts: 2
Credit: 1,034,115
RAC: 0
Message 95350 - Posted: 25 Apr 2020, 15:03:23 UTC - in response to Message 95082.  

For me, personally, I'm not driven by the credits granted for running work units; It's about contributing to the science, either by running work units which model a particular behaviour or sheer crunching of data for further treatment or research candidate selection/rejection.

I'd rather the see application development *and* testing effort be expended producing efficient and effective code. I'd also like to see more realistic operational criteria being assigned to work units so as not to 'waste' computing effort (and electricity) by having my machines swamped with, often, spuriously defined deadlines, maybe by including some operational acceptance testing rather than just functional tests.

That's my 10c's worth ;-)

Ged

Ged, I just wanted to clarify, are you basically suggesting that you'd like to see some way to control the deadline of the work you receive? Or have a way to only be assigned WUs that have 8 day deadlines? Or are you referring to cases where the BOINC Manager gets tricked into requesting more R@h work than is required to fill your work cache, and to complete before the 3 day deadlines?


Mod.Sense Not to control the deadline of received WUs nor only accepting 8-day deadline WUs, it's more the latter case but some means to ensure that a WU has a realistic deadline for a given WU's payload.
ID: 95350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
teacup_DPC

Send message
Joined: 3 Apr 20
Posts: 6
Credit: 2,744,282
RAC: 0
Message 95422 - Posted: 27 Apr 2020, 14:17:18 UTC - in response to Message 95348.  
Last modified: 27 Apr 2020, 14:18:23 UTC

Hi sangaku

I found your https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13791#94266 thread, read some of its first posts. I liked the questioning approach of it , and will direct responses concerning what hardware to use in that topic.

Your Raspberry Pi 4 remark did set me thinking. Without doing the math I got a vision of a stack of these things, each taking 2 or 3 threads. Being a Dutch my financial domain, as yours, is Euros, and a Pi4 can be fetched in Holland for around 50-60 Euro's. Storage and PSU for all those Pi's should be approached in some clever combined way. First will completely read that topic now, probably the math will not add up, making a Pi 4 a no go. But only fantasizing about that pile of Pi's made my morning a good one, though it probably was not the aim of your post :|.

Thanks!
ID: 95422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 95429 - Posted: 27 Apr 2020, 16:52:59 UTC

I don't really care about credits, as long as they are consistent so we can use them to judge the performance of different computers it's fine.

Instead the main problem for WUs whose models take too much time, is the checkpointing. Shutting down a PC and losing 6 hours of work isn't good. To solve this problem, if of course changing how checkpointing works, a good idea is to let us choose if we want to get these WUs where checkpointing is problematic. If someone keeps his pc running 24/24 then they can get these WUs without problems. If instead someone shut it down every day then it's better to avoid them.
Sure, if checkpointing can be changed to save the progress no matter if a model is completed or not then no problem.
ID: 95429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile lazyacevw

Send message
Joined: 18 Mar 20
Posts: 12
Credit: 93,576,463
RAC: 0
Message 95449 - Posted: 27 Apr 2020, 22:36:03 UTC
Last modified: 27 Apr 2020, 22:37:55 UTC

My question about credits is, what is up with this guy? Within 3 days, he has the top three "fastest" computers by nearly a factor of 6.

[/img]
ID: 95449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1675
Credit: 17,697,137
RAC: 20,072
Message 95461 - Posted: 28 Apr 2020, 5:13:59 UTC - in response to Message 95449.  

My question about credits is, what is up with this guy? Within 3 days, he has the top three "fastest" computers by nearly a factor of 6.
They are returning a lot of Tasks for such a small number of core/threads.
0.72 day turn around. 8 hour runtime. 4,600 Tasks in progress on one system, over 6000 Valid.
0.72 day turn around. 8 hour runtime. 1,300 Tasks in progress on the others, roughly 1,650 each Valid on the others.

Number of times client has contacted the server, 3 for one system. 0 for the others?

Some sort of CPU compute cluster feeding it's results through those host IDs?
Grant
Darwin NT
ID: 95461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 0
Message 95467 - Posted: 28 Apr 2020, 9:06:58 UTC

Yes, pretty clearly using those hosts to somehow feed work to other cpus. Very clever but clearly not actually the top cpu.

Does anyone know how he can do that out of interest?
ID: 95467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 95472 - Posted: 28 Apr 2020, 11:59:35 UTC

Over 750.000 RAC with a single computer? Even a dual EPYC 7702 computer has no way to get such a high RAC. And his pc seems to have a single EPYC 7702P
ID: 95472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]_Fatal_Error_Group~Bubbles

Send message
Joined: 17 Mar 06
Posts: 1
Credit: 382,602
RAC: 0
Message 95488 - Posted: 28 Apr 2020, 17:06:07 UTC

Someone from DPC over here: we've notified the guy running the Nifhack account of this thread and asked if he wants, and is able to, clarify this. He's know for having access to huge amounts of computational power (at work, I believe) but can't deploy all of it all the time. He's also known to rarely part with specifics. My guess is as well those machines are indeed some sort of hosts to the computers behind.
ID: 95488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks



©2024 University of Washington
https://www.bakerlab.org