Tells us your thoughts on granting credit for large protein, long-running tasks

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4002
Credit: 0
RAC: 0
Message 94913 - Posted: 19 Apr 2020, 17:35:52 UTC
Last modified: 19 Apr 2020, 17:36:34 UTC

R@h adapts to changing requirements. With these new large protein models coming soon, tagged with a 4GB memory bound, and with models that may take several hours to run, enough that the watchdog has been extended from its normal 4hours to 10 hours, it seems credit may need some changes as well.

Normally, credit is granted based on the cumulative reported CPU time per model. And so a fast machine with lots of memory computes more models and gets more credit than an older system. But, in the case of these 4GB WUs, they will not even be sent to machines that do not have at least 4GB of memory (and normally BOINC would only be allowed to use less than 100% of that, so I should say where BOINC is allowed to use at least 4GB). So there will be no struggling Pentium 4s reporting any results to reflect the difficulty in the cumulative average.

Now a 4GB tagged WU will generally not consume that much memory to run. It is an upper limit, if the BOINC Manager sees requests for memory that exceed that 4GB the task is actually aborted.

So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable?

There are many ways to look at it, so we thought we'd open it up for discussion. Please keep things respectful. Probably best to just state your own perspective on the topic and not address other posts directly, and certainly no need for rebuttals here. We're trying to brainstorm.
Rosetta Moderator: Mod.Sense
ID: 94913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 111
Credit: 3,836,858
RAC: 5,240
Message 94917 - Posted: 19 Apr 2020, 17:45:42 UTC - in response to Message 94913.  
Last modified: 19 Apr 2020, 17:51:47 UTC

A fair day’s credit for a fair day’s computation would be my take.

If a normal WU takes 80,000 GFLOPs and gets 300 credits and these new resource hungry WUs take 120,000 GFLOPs then grant 450 credits.

I know that the GFLOPs will vary between machines but there’s surely some way of estimating it from the complexity of the task and the number of decoys created.
ID: 94917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 3,793,126
RAC: 1,043
Message 94922 - Posted: 19 Apr 2020, 18:15:37 UTC

It will depend on the checkpointing. If these long models need a continuous span of 5 hours to complete, then a fair amount of work will be lost on machines that are not running 24/7.

It will also depend on the % of time where more than 1 GB of memory is required to run it.

I don't watch credits much. I'll just add some WCG to the mix (which are usually low memory WUs), and let the BOINC Manager worry about what to dispatch when.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 94922 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
likeapresident

Send message
Joined: 14 Mar 20
Posts: 7
Credit: 1,096,138
RAC: 22
Message 94926 - Posted: 19 Apr 2020, 18:54:16 UTC - in response to Message 94913.  

how about Increase the credit by the memory size requirement.

So if normal work units only needed 1GB of memory and issued 10 credit. If these large models need 4GB of memory, the credit should be 40 credits.
ID: 94926 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
teacup_DPC

Send message
Joined: 3 Apr 20
Posts: 6
Credit: 578,761
RAC: 5,375
Message 94937 - Posted: 19 Apr 2020, 20:24:33 UTC - in response to Message 94913.  

So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable?

There are many ways to look at it, so we thought we'd open it up for discussion. Please keep things respectful. Probably best to just state your own perspective on the topic and not address other posts directly, and certainly no need for rebuttals here. We're trying to brainstorm.


I am still quite new to actively using distributed computing, so I hope my thoughts are of relevance. The footprint of a WU can be seen as a multiplication of the resources processor capacity and memory space. I am not fully aware what disk space is needed, so I leave that aside.

When needing (worst case) 4GB of memory the 4GB task prevents the client from running 4 1GB tasks at the same time.
The single 4GB task will probably use less processor capacity (1 core?) than the parallel processor capacity needed for the 4 separate 1GB tasks (4 cores?).
So per time unit the credits should be positioned somewhere between 1 1GB task and 4 1GB tasks. More than 1 1GB task while the memory use is four times as much, and less than 4 1GB tasks, while the single task can be completed with one core.
Where exactly to position the credits between 1x1GB and 4x1GB depends on the availability of processor cores and memory in the clients capable for the 4GB jobs, you can judge that better than I. My long shot will be that the credits will end up somewhere between 2,5 and 3 1GB jobs, per time unit.

When running a 4GB WU with one core more data needs to be dealt with, so it is probable the task will need more time. This can be covered with the time dependency in the credits. Maybe an extra bonus for the somewhat higher risk of failing because of the longer throughput time, as others did suggest as well.

I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.
ID: 94937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1317
Credit: 24,261,030
RAC: 12,198
Message 94942 - Posted: 19 Apr 2020, 20:38:26 UTC - in response to Message 94913.  

Pick a number.
If you get more than 3 complaints about it, increase it further.
If you get 3 or fewer, keep it as it is.

Only half-joking...
ID: 94942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 76
Credit: 8,337,858
RAC: 72,694
Message 94943 - Posted: 19 Apr 2020, 20:56:09 UTC

Where do I spend my credits again? I seem to be accumulating a lot of them but I'm not sure where the credit store is. I'm anxious to start redeeming them for valuable prizes and rewards.
ID: 94943 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 539
Credit: 2,140,299
RAC: 22,789
Message 94950 - Posted: 19 Apr 2020, 23:14:11 UTC - in response to Message 94937.  

I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.
That's the thinking.
Over all, the effect should be neutral. People shouldn't lose out for processing these larger RAM requirement Tasks, and they shouldn't get a boost either. All the work is important, so if a Tasks stops 2 or more others form being processed at that time, it needs to offset that loss in production.

Credits can't buy you a toaster, but they can let you see how you are doing, and how much you have done to help Rosetta.
Grant
Darwin NT
ID: 94950 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1523
Credit: 5,853,861
RAC: 28
Message 94951 - Posted: 19 Apr 2020, 23:16:02 UTC - in response to Message 94937.  

So, it seems reasonable that these 4GB work units should come with a premium on credits granted. But how much of a premium is reasonable?

There are many ways to look at it, so we thought we'd open it up for discussion. Please keep things respectful. Probably best to just state your own perspective on the topic and not address other posts directly, and certainly no need for rebuttals here. We're trying to brainstorm.


I am still quite new to actively using distributed computing, so I hope my thoughts are of relevance. The footprint of a WU can be seen as a multiplication of the resources processor capacity and memory space. I am not fully aware what disk space is needed, so I leave that aside.

When needing (worst case) 4GB of memory the 4GB task prevents the client from running 4 1GB tasks at the same time.
The single 4GB task will probably use less processor capacity (1 core?) than the parallel processor capacity needed for the 4 separate 1GB tasks (4 cores?).
So per time unit the credits should be positioned somewhere between 1 1GB task and 4 1GB tasks. More than 1 1GB task while the memory use is four times as much, and less than 4 1GB tasks, while the single task can be completed with one core.
Where exactly to position the credits between 1x1GB and 4x1GB depends on the availability of processor cores and memory in the clients capable for the 4GB jobs, you can judge that better than I. My long shot will be that the credits will end up somewhere between 2,5 and 3 1GB jobs, per time unit.

When running a 4GB WU with one core more data needs to be dealt with, so it is probable the task will need more time. This can be covered with the time dependency in the credits. Maybe an extra bonus for the somewhat higher risk of failing because of the longer throughput time, as others did suggest as well.

I expect you do not want to end up with a bias toward 1GB or 4GB jobs, while both are needed. For the clients that can handle the 4GB jobs the bias should be neutral. Unless you expect a tendency towards more 4GB jobs with respect to 1 GB jobs, or the other way around, then you want a bias.


I think the new longer tasks should get more than the credit awarded for running 4 1gb tasks, the idea being that they can't be run by everyone and that by definition means older and slower pc;s which should be enouraged to be replaced or updated as over time they simply won't be able to keep up. How much more depends on the priority the Project places on these new workunits, if they are just new workunits than only a minimal amount of credit above the amount 4 1gb tasks would get, on the other hand if the new tasks are a higher than normal priority than a higher credit should be given to encourage people to crunch them instead.
ID: 94951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 1,291,565
RAC: 10,359
Message 94952 - Posted: 19 Apr 2020, 23:18:55 UTC - in response to Message 94913.  

The answer is +25% credit compensation.

I've simply plugged in the data (4x memory requirement) into my equations for 3700X and 3950X at the The most efficient cruncher rig possible thread that amortizes the cost of power and capital expenditure against produced RAC and solved for the needed RAC correction to arrive at the same RAC/$/5years. I assumed 0.3W/GB of power consumption for DDR4 RAM.

Although, you could rephrase the same question in another way: based on supply and demand, if a great majority of volunteers have insufficient amount of memory, how do I incentivize them to purchase more? If you put it that way, adding +50% may not be that far fetched.
ID: 94952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 20
Credit: 1,256,443
RAC: 394
Message 94962 - Posted: 20 Apr 2020, 2:10:04 UTC - in response to Message 94952.  
Last modified: 20 Apr 2020, 2:11:02 UTC

ID: 94962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 905
Credit: 10,368,637
RAC: 4,495
Message 94963 - Posted: 20 Apr 2020, 3:47:54 UTC - in response to Message 94913.  

Any such extra large workunit should come with a way to limit the number of that type of workunits in memory at any one time. My computers can handle more than one at a time, but not one for every virtual core. I'll leave the discussion of extra credits to others.
ID: 94963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile WBT112

Send message
Joined: 11 Dec 05
Posts: 11
Credit: 1,117,091
RAC: 37
Message 94971 - Posted: 20 Apr 2020, 7:31:56 UTC

While I still prefer limiting these "monster" workunits to 1 or 2 per host (WCG does that for 1 GB+) for extra credits we need to look at the hardware of the average active host.

Let's assume the average host with MORE THAN 4 GB RAM has 4 Threads and 8 GB of RAM (which is a bit low but i don't know the numbers). 75% of RAM usage while active is default (?) means 6 GB RAM usage is allowed normally.
So usually it can process 4 "normal" workunits with 1 GB, making it use 100% of its threads.

When you now change the preference to 4 GB this would change the capacity to only one workunit.. so a 75% bonus would be needed to offset this.
However considering that the machine will likely not only process these monsters or might have other projects running at least +25% seems like an appropriate amount imho
ID: 94971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 214
Message 94972 - Posted: 20 Apr 2020, 8:08:03 UTC

The credit system is broken as far as I'm concerned anyway, 2 virtually identically named units can finish within a minute of each other and have wildly different credits. It makes absolutely no sense at all.

Imo having a time based system is one issue, instead there should be no time, the units should be simply setup so that they complete x amount of decoys, the researchers can set each run of work so that x number of decoys take a desired time

Example, rb1, 1 decoy takes 1 hour approx on reference hardware, credits awarded per decoy 100, unit size 4 decoys. Approx runtime 4 hours. Best hardware takes 3 hours, worst 5 hours, faster hardware clearly gets more reward per hour.

Rb2 1 decoy takes 15 mins approx on reference hardware, credit awarded per decoy 25, unit size 12 decoys. Etc

Obviously size can be set to what researchers want, perhaps instead of time preference for crunchers simply have wu size, small, medium, and large which could be a preference only, but wouldn't preclude you getting wu of any size

Perhaps a further option such as in wcg where you can set a number for these new larger wu's being proposed to run on your machine, default could be set to 1 per 16gb of machine memory to allow for other tasks to run.

I imagine credit compensation should be on the basis that if these larger units require you to suspend a core to allow it to run then double the credit should be awarded.


Enough for this post but is there no way to allow a workunit to use more than 1 core? This would reduce memory issues considerably and allow larger work to be run.
ID: 94972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 1,291,565
RAC: 10,359
Message 94973 - Posted: 20 Apr 2020, 8:32:12 UTC - in response to Message 94972.  

In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home.
* https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&postid=10377
* http://https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2194&postid=24612#24612

Basically, from what I understand, what really bothers you right now is WU to WU variability. Note that this will all average out on the long term, even after a few days, but definitely after 2 weeks (RAC).

Thus your aggregate credit count and RAC is still as good as anything for the purpose of keeping up the friendly competition with your peers, getting feedback about your contribution, fine tuning the performance of your hardware, checking whether your boxes are producing as expected for the given kind of hardware, etc.

Hence what you are asking (precise WU flops estimates) would require lots of development and maintenance time to be devoted to something that isn't that important at all and would take away time from research.
ID: 94973 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
strongboes

Send message
Joined: 3 Mar 20
Posts: 27
Credit: 5,394,270
RAC: 214
Message 94974 - Posted: 20 Apr 2020, 8:56:51 UTC - in response to Message 94973.  

In my opinion, the credit system is not broken at all, it is working well as intended. Please read the posts that explain how credits are awarded in Rosetta@home.
* https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&postid=10377
* http://https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2194&postid=24612#24612

Basically, from what I understand, what really bothers you right now is WU to WU variability. Note that this will all average out on the long term, even after a few days, but definitely after 2 weeks (RAC).

Thus your aggregate credit count and RAC is still as good as anything for the purpose of keeping up the friendly competition with your peers, getting feedback about your contribution, fine tuning the performance of your hardware, checking whether your boxes are producing as expected for the given kind of hardware, etc.

Hence what you are asking (precise WU flops estimates) would require lots of development and maintenance time to be devoted to something that isn't that important at all and would take away time from research.



I'm aware how credits are awarded, there is huge variation.

What I've suggested I doubt is much of a change, instead of a wu being called to finish after a time period it would be a decoy count number which is set by researcher. Before batch is sent out it is run on a known hardware to determine credit per decoy awarded. Its very simple.
ID: 94974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 38
Credit: 1,721,167
RAC: 461
Message 94982 - Posted: 20 Apr 2020, 11:27:22 UTC - in response to Message 94913.  

R@h adapts to changing requirements. With these new large protein models coming soon, tagged with a 4GB memory bound, and with models that may take several hours to run, enough that the watchdog has been extended from its normal 4hours to 10 hours, it seems credit may need some changes as well.

Normally, credit is granted based on the cumulative reported CPU time per model. And so a fast machine with lots of memory computes more models and gets more credit than an older system. But, in the case of these 4GB WUs, they will not even be sent to machines that do not have at least 4GB of memory (and normally BOINC would only be allowed to use less than 100% of that, so I should say where BOINC is allowed to use at least 4GB). So there will be no struggling Pentium 4s reporting any results to reflect the difficulty in the cumulative average.

[...]

if( maxMemoryUsed > 1) {
	grantedCredits = normalCredits * maxMemoryUsed;
}
else {
	grantedCredits = normalCredits;
}


E.g. a host gets 40cr/h. If it uses up to3.5GB of memory, you pay it 40*3.5=140cr/h.
ID: 94982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
torma99

Send message
Joined: 16 Feb 20
Posts: 14
Credit: 288,937
RAC: 4
Message 94995 - Posted: 20 Apr 2020, 15:05:44 UTC

For the masses I think there should be extra credit for bigger workouts. For me doesn't matter. What I would like more to have some descriptions about the workunits. Not novels, but maybe 50-80 words (so I can google them more if I am interested in the topic), maybe the lab which will use the data, or some info about the researchers. For me that would matter more, because I am just an average joe with 16 threads, but moreover because I do not believe in the credit system. If I were some system administrator at a huge server farm and I could convince my bosses to let me use some percent of the idle time, I would be the king cruncher, but I think the scientific results matter more.

I joined in mid February. Since then there was an update on Rosetta, I could send back some Ralph WUs, there was the thread with the fluorescent proteins, now there is a discussion about this more complex folding, so as a commoner I assume the project heads into the right direction, for me that counts. And that is what convinced me, that till the end of the year I will try my best to run my computer 7/24. And after that I will adjust to my financial situation (maybe have to turn off some threads to cut a couple of euros in the electricity bill). No credits can change that, I could have quadrillion-zillion credits, but that is just a number written in a database , the real value is in the act to voluntarily donate computing time to the scientists to help them solve problems which otherwise could take more time. (Sorry for my broken English, learned the language alone ;-) )
ID: 94995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1113
Credit: 4,713,064
RAC: 5,101
Message 94997 - Posted: 20 Apr 2020, 16:13:50 UTC

I'm very happy for this new science page, but i cannot understand clearly how these wus will run.
There are more decoys in a single wu or is just a single big decoy?
And if it is the second option and i restart the pc after 7hs, the wu will restart from 0?

If so, we loose all "little volunteers" that haven't 24/7 system
ID: 94997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 5176
Credit: 0
RAC: 0
Message 94999 - Posted: 20 Apr 2020, 18:13:12 UTC

I want to make this clear since there is some mis-information going around. These long, up to 2000 residue, sequences that are sometimes, but not often, submitted to Robetta, our protein structure prediction server, have been around for a while now. They are nothing new, may or may not be related to COVID-19, and are rare. We have just adjusted the logic to make sure these jobs are assigned with enough memory and time to complete . The vast majority of jobs have a smaller memory footprint of less than 2g and produce models at timescales in minutes and not hours.

Run times and memory usage may vary but a typical 2000 residue Rosetta comparative modeling job from Robetta (these jobs are rare) took a little over an hour to produce 1 model and used 1.8 gigs of RAM on our local cluster.

These jobs should not be confused with the problematic cyclic peptide jobs (which have been canceled) that users reported sometimes taking longer than the cpu run time preference to complete. This also was a rare event and likely due to random trajectories that were not passing model quality filtering criteria. These cyclic peptide jobs have a small memory footprint and can produce models at a faster pace.

These issues highlight the fact that Rosetta@home runs a variety of protocols for modeling and design, the result of which can be seen from the variety of research publications and projects related to diseases such as COVID-19 and cancer, vaccine development, nano-materials, cellular biology, structural biology, environmental sciences, and the list goes on. These are within the Baker lab and IPD, but there are also researchers around the world using the Robetta structure prediction server for a vast variety of research.
ID: 94999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks



©2020 University of Washington
https://www.bakerlab.org