How to Limit CPU cores ?

Author	Message
Keith T. Send message Joined: 1 Mar 07 Posts: 58 Credit: 34,135 RAC: 0	Message 96445 - Posted: 13 May 2020, 15:51:53 UTC I've recently started running Rosetta again, after several years away. I'm running it on some fairly low spec devices, https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4341280 and https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4087082 I recently updated my settings for LHC@home, and was pleasantly surprised to see that I can restrict how many Cores the project will use https://lhcathome.cern.ch/lhcathome/prefs.php?subset=project but I can't see a similar option for Rosetta. Could someone help me with this please ? ID: 96445 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 96452 - Posted: 13 May 2020, 17:23:53 UTC Last modified: 13 May 2020, 17:24:05 UTC You can configure project_max_concurrent in app_config.xml https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration Rosetta Moderator: Mod.Sense ID: 96452 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107095 - Posted: 6 Oct 2022, 2:20:27 UTC Last modified: 6 Oct 2022, 2:35:27 UTC OK, I created app_config.xml in the Rosetta projects directory and reloaded all the config files, as follows: <app> <name>Rosetta@home</name> <max_concurrent>2</max_concurrent> </app> <project_max_concurrent>2</project_max_concurrent> This was a 2-stage operation; first I added the project_max_concurrent line, to no effect. Adding <app> section similarly has had no effect. What do I need to do now to limit the number of running Rosetta tasks to 2? Never mind, I forgot to include the enveloping <app_config>..</app_config>... duh. Even so, it would be very nice to be able to limit the number of tasks actually assigned by the server to my system. I have 2 Rosetta tasks running right now, 13 waiting to run, and all of them have report deadline of just before 0130 UTC on 9 October (time now is 0234 UTC 6 October). I am not sure, with a run time of 8 hours each, that I will be able to finish all of them on time. ID: 107095 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,076,156 RAC: 21,133	Message 107098 - Posted: 6 Oct 2022, 10:10:52 UTC - in response to Message 107095. Even so, it would be very nice to be able to limit the number of tasks actually assigned by the server to my system. I have 2 Rosetta tasks running right now, 13 waiting to run, and all of them have report deadline of just before 0130 UTC on 9 October (time now is 0234 UTC 6 October). I am not sure, with a run time of 8 hours each, that I will be able to finish all of them on time. The idea is to plan to succeed, not to plan to fail, so either: - increase cores to 3 for a while - if you have 3, that is - be aware that Rosetta's deadlines are 3 days, so adjustreduce your buffer accordingly. The stock of tasks is held at Rosetta. No need to have your own stock too - if all possible adjustments are looking like you're still going to miss a deadline, abort a task now. It'll be reissued to one of 10s of thousands of other hosts to run. No need to worry about it ID: 107098 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107100 - Posted: 6 Oct 2022, 11:16:15 UTC - in response to Message 107098. The idea is to plan to succeed, not to plan to fail, so either: - increase cores to 3 for a while - if you have 3, that is - be aware that Rosetta's deadlines are 3 days, so adjustreduce your buffer accordingly. The stock of tasks is held at Rosetta. No need to have your own stock too - if all possible adjustments are looking like you're still going to miss a deadline, abort a task now. It'll be reissued to one of 10s of thousands of other hosts to run. No need to worry about it If by "reduce your buffer" you mean adjust the "Store... X days of work" values, I have not touched them since I began running boinc in early September. I didn't ask for all those 15 tasks; it is almost as if Rosetta's system saw my 12 "CPUs" (threads, actually), and assumed they are all available for Rosetta tasks. In fact, I give 10 of those threads to boinc projects, and keep 2 for the OS. LHC is my primary focus, and it gets 8 threads. This will not change. So far, the 2 running Rosetta tasks have been proceeding reasonably well. It looks as if the initial 8 hour estimated complete time might be some figure gleaned from a ouija board, as all completed tasks so far have finished in just over 3 hours. I'll take it as far as I can without aborting anything, to see how much I can actually get done. Thanks for your suggestions. Learning all of boinc's peculiarities is not proving to be easy, and it doesn't help that there does not seem to be any complete manual anywhere that I have found so far. For example, it doesn't even mention the app_config.xml file; I only learned about that when I came here, trying to find a way to limit the number of running tasks (initially, my client gave Rosetta tasks 9 active threads, which is quite unacceptable to me). But, little by little, it is all coming together, so thanks again. ID: 107100 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 107104 - Posted: 7 Oct 2022, 5:06:52 UTC - in response to Message 107100. Last modified: 7 Oct 2022, 5:08:29 UTC In fact, I give 10 of those threads to boinc projects, and keep 2 for the OS. LHC is my primary focus, and it gets 8 threads. This will not change. And there is your problem- you don't understand how BOINC works. You set the Resource Share between projects (which is a ratio- not a percentage), and BOINC does it's best to make it happen. The Resource Share is not based on the number of cores or threads being used, but on work actually done. eg Project A has an extremely efficient application, Project B has an extremely inefficient application. So If you have the Resource share set as 100 for Project A and 100 for Project B, then A gets 1 Thread while B gets 15, yet they both get the same amount of work done (due to the difference in their applications efficiency). That balance is achieved over time- that's weeks. Not minutes or hours. The more Projects you do, the larger your cache, the more you manually try to balance things, the longer it will take for your Resource Share settings to actually be met- think not weeks, but months. If you run more than one Project then there is no need for a cache ("Store at least 0.1 days of work" and "Store up to an additional 0.01 days of work" is plenty)- if a Project is out of work, then the other Projects will get extra processing time. When the out of work Project gets work again, it will get the lion's share of processing for a short while, taking processing time from the other projects till things meet your Resource share settings (which is probably what happened here as Rosetta has have very little Rosetta 4.20 work available for quite a while now). Grant Darwin NT ID: 107104 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107110 - Posted: 7 Oct 2022, 15:07:08 UTC - in response to Message 107104. In fact, I give 10 of those threads to boinc projects, and keep 2 for the OS. LHC is my primary focus, and it gets 8 threads. This will not change. And there is your problem- you don't understand how BOINC works. You set the Resource Share between projects (which is a ratio- not a percentage), and BOINC does it's best to make it happen. The Resource Share is not based on the number of cores or threads being used, but on work actually done. eg Project A has an extremely efficient application, Project B has an extremely inefficient application. So If you have the Resource share set as 100 for Project A and 100 for Project B, then A gets 1 Thread while B gets 15, yet they both get the same amount of work done (due to the difference in their applications efficiency). That is hardly my problem. The problem is that the user manual on boinc's main website is woefully incomplete; it does not, for example, even mention the app_config.xml file at all. And in truth, I have no interest in achieving any kind of "balance" (as pre-determined by some programmer) between projects. If I want to limit the number of threads to any given project, then that is what I want, period. The only effective way to do this, it seems, is with app_config.xml files. I only discovered those quite by accident, after Rosetta dumped those 15 tasks on me -- which, btw and imnsho, it had no business doing, when my system has only 12 "CPUs" (threads, actually, but that's really just semantics) -- especially when they all have a due date only 3 days in the future. I also took the controls off at LHC (after I got Rosetta under control, thankfully), which has just started up its 3rd run. Fortunately, I had disabled ATLAS for a short time, or I don't know what would have happened -- as it is, I got an instant dump of 41 new tasks, 23 of them in CMS, which can have rather long run times (and ATLAS can be even worse). LHC had no business doing that either. Rosetta now appears to be under control: with max 2 threads available to it, I have 2 tasks running and 1 (sometimes 2) waiting in the wings. That is perfectly fine with me, and in truth, I really do not care if I run out of Rosetta tasks sometime in the future, as there will always be plenty of work to do at LHC. I don't even care if 2 threads sit idle for a time -- I can always adjust the config file for LHC to let them have it all, and change that back if/when Rosetta came back into play (and vice versa, of course -- I am an equal opportunity number cruncher :D ) That balance is achieved over time- that's weeks. Not minutes or hours. The more Projects you do, the larger your cache, the more you manually try to balance things, the longer it will take for your Resource Share settings to actually be met- think not weeks, but months. If you run more than one Project then there is no need for a cache ("Store at least 0.1 days of work" and "Store up to an additional 0.01 days of work" is plenty)- if a Project is out of work, then the other Projects will get extra processing time. When the out of work Project gets work again, it will get the lion's share of processing for a short while, taking processing time from the other projects till things meet your Resource share settings (which is probably what happened here as Rosetta has have very little Rosetta 4.20 work available for quite a while now). To summarize, I do not want one project taking up the overwhelming majority of my system resources. The amount of work any project sends to any given system should be based, at least in part, on the total resources available on that system. At present, without the presence of an app_config.xml file, that does not appear to be the case (as my experience with both Rosetta and LHC clearly demonstrates). ID: 107110 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107114 - Posted: 7 Oct 2022, 21:17:23 UTC - in response to Message 107110. That is hardly my problem. The problem is that the user manual on boinc's main website is woefully incomplete; it does not, for example, even mention the app_config.xml file at all. I do apologize for the wrong information; the boinc user manual does, in fact, have a section on this file. However, I am not at all satisfied with the explanations it gives. ID: 107114 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 107116 - Posted: 7 Oct 2022, 22:21:07 UTC - in response to Message 107110. That is hardly my problem. Yes it is, because you don't know how the software you are using actually works, you are trying to get it to do things it was not designed to do. And in truth, I have no interest in achieving any kind of "balance" (as pre-determined by some programmer) between projects. It is not determined by the programmer, it is determined by the settings you make with regard to your Resource Share settings. If I want to limit the number of threads to any given project, then that is what I want, period. The only effective way to do this, it seems, is with app_config.xml files. Then do so. Just keep in mind that there will be periods where some cores/ threads will actually go unused in order to meet your arbitrary core/thread limits and Resource share settings. And you will continue to receive unexpected dumps of work from unexpected Projects if you continue to micromanage things. I also took the controls off at LHC (after I got Rosetta under control, thankfully), which has just started up its 3rd run. Fortunately, I had disabled ATLAS for a short time, or I don't know what would have happened -- as it is, I got an instant dump of 41 new tasks, 23 of them in CMS, which can have rather long run times (and ATLAS can be even worse). LHC had no business doing that either. Rosetta now appears to be under control: with max 2 threads available to it, I have 2 tasks running and 1 (sometimes 2) waiting in the wings. That is perfectly fine with me, and in truth, I really do not care if I run out of Rosetta tasks sometime in the future, as there will always be plenty of work to do at LHC. I don't even care if 2 threads sit idle for a time -- I can always adjust the config file for LHC to let them have it all, and change that back if/when Rosetta came back into play (and vice versa, of course -- I am an equal opportunity number cruncher :D ) And here you are showing that you haven't the slightest idea of what BOINC is about & instead of letting things actually work according to your Resource share settings, you continually manually fiddle with things and make it impossible for that to ever occur. The whole point of BOINC is set and forget- you set things the way you want them (without arbitrary limitations) and it will meet those goals. But because you keep changing things, those settings can never be met. Those dumps of work- all a result of your continued fiddling. To summarize, I do not want one project taking up the overwhelming majority of my system resources. The amount of work any project sends to any given system should be based, at least in part, on the total resources available on that system. At present, without the presence of an app_config.xml file, that does not appear to be the case (as my experience with both Rosetta and LHC clearly demonstrates). To summarise- that is what actually happens when you don't keep trying to micromanage things. Grant Darwin NT ID: 107116 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,076,156 RAC: 21,133	Message 107118 - Posted: 8 Oct 2022, 0:19:00 UTC - in response to Message 107100. Last modified: 8 Oct 2022, 0:29:46 UTC The idea is to plan to succeed, not to plan to fail, so either: - increase cores to 3 for a while - if you have 3, that is - be aware that Rosetta's deadlines are 3 days, so adjust/reduce your buffer accordingly. The stock of tasks is held at Rosetta. No need to have your own stock too - if all possible adjustments are looking like you're still going to miss a deadline, abort a task now. It'll be reissued to one of 10s of thousands of other hosts to run. No need to worry about it If by "reduce your buffer" you mean adjust the "Store... X days of work" values, I have not touched them since I began running boinc in early September. I didn't ask for all those 15 tasks; it is almost as if Rosetta's system saw my 12 "CPUs" (threads, actually), and assumed they are all available for Rosetta tasks. In fact, I give 10 of those threads to boinc projects, and keep 2 for the OS. I'm not going to be as harsh as Grant, but it is true to say this is a Boinc issue for the most part. It seems to go a bit crazy when you newly join a Project and takes a fair few results until it picks up on your Host's preferences, which I think you rightly complain about as people have reported the issue before. From your point of view, the good news is that, while Boinc can start badly in assessing what to send you, it does improve over time. And I don't think you could reasonably know that in advance, so imo it's not your fault. LHC is my primary focus, and it gets 8 threads. This will not change. So far, the 2 running Rosetta tasks have been proceeding reasonably well. It looks as if the initial 8 hour estimated complete time might be some figure gleaned from a ouija board, as all completed tasks so far have finished in just over 3 hours. I'll take it as far as I can without aborting anything, to see how much I can actually get done. This one I think <is> a Rosetta issue, but a weird one. I don't get the problem, and I think the majority don't either, but one or two have reported it and no-one's really got to the bottom of why to the best of my knowledge. On the plus side, it does help you with the excess tasks you've initially been sent, so we'll chalk that up as an unexpected win while Boinc is sorting itself out. Thanks for your suggestions. Learning all of Boinc's peculiarities is not proving to be easy, and it doesn't help that there does not seem to be any complete manual anywhere that I have found so far. For example, it doesn't even mention the app_config.xml file; I only learned about that when I came here, trying to find a way to limit the number of running tasks (initially, my client gave Rosetta tasks 9 active threads, which is quite unacceptable to me). But, little by little, it is all coming together, so thanks again. To be fair to Boinc (words that rarely ever pass my lips) each project has its own quirks and Boinc has one interface to try to manage them all together, which isn't always easy without using app_config files as you've already discovered. When you run multiple projects the conflicts between the settings each project prefers tend to amplify the differences and have a tendency to shock until both Boinc and you get used to them and both of you appreciate the compromises that are necessary. I think, by the time a second batch of Rosetta tasks gets downloaded by Boinc, there ought to be a noticeable improvement, even if it isn't quite perfect, and then you'll feel less uncomfortable with what it's delivered to you. There'll be a tendency for you to intervene in what's been delivered to you, but if you can refrain from making manual adjustments then Boinc will learn all the quicker. While manual adjustments will help in the short term they also prevent Boinc from learning what it needs to do itself. But if that's not the case, come back and ask again. Edit: Seems I'm already late replying as you've gone through these stages already. Here, Grant is correct imo. Continuing to Intervene is making matters worse, not better. The shock of the initial excessive download of tasks is causing an assumption it'll always be that way, when it's apparent Boinc was already adjusting. Let that initial shock disappear and let Boinc handle how to get out of its own mess and it'll resolve itself on its own. I'm a tinkerer too, but the solution is not to. If that means you come back in 3 days time and say "Ah-ha! Boinc failed!" so be it. But let it fail on its own merits, without sabotaging it along the way. That's my advice. For now. ID: 107118 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,076,156 RAC: 21,133	Message 107119 - Posted: 8 Oct 2022, 0:56:38 UTC - in response to Message 107104. In fact, I give 10 of those threads to boinc projects, and keep 2 for the OS. LHC is my primary focus, and it gets 8 threads. This will not change. And there is your problem- you don't understand how BOINC works. You set the Resource Share between projects (which is a ratio- not a percentage), and BOINC does it's best to make it happen. The Resource Share is not based on the number of cores or threads being used, but on work actually done. eg Project A has an extremely efficient application, Project B has an extremely inefficient application. So If you have the Resource share set as 100 for Project A and 100 for Project B, then A gets 1 Thread while B gets 15, yet they both get the same amount of work done (due to the difference in their applications efficiency). That balance is achieved over time- that's weeks. Not minutes or hours. The more Projects you do, the larger your cache, the more you manually try to balance things, the longer it will take for your Resource Share settings to actually be met- think not weeks, but months. If you run more than one Project then there is no need for a cache ("Store at least 0.1 days of work" and "Store up to an additional 0.01 days of work" is plenty)- if a Project is out of work, then the other Projects will get extra processing time. When the out of work Project gets work again, it will get the lion's share of processing for a short while, taking processing time from the other projects till things meet your Resource share settings (which is probably what happened here as Rosetta has have very little Rosetta 4.20 work available for quite a while now). Actually you raise an interesting point here and I have to admit I don't understand exactly how it works either. Resource share and core limits per project may not be cooperative settings. They may be conflicting. And the point you make about the timescale over which resource share is met is relevant too. Is resource share the simple number of tasks or the time those tasks run? I don't know. If resource share is set equally between projects, but only 2 cores are given to one project and 10 to another, does that mean you have to get 5 times as many downloads of the lower share one to match what runs on the higher share one? I don't know. If resource share is a measure over a longer timescale, it may be that it's achieved by running <all> cores on one project for 5 times as long as <all> cores are used for the other. What seems to be expected is that at all times, 10 cores are set to one project and 2 cores to the other. From my dozen or so years of running Boinc projects (admittedly running only two projects for the majority of the time) that never EVER happens naturally, only by manual intervention, so if that's what you want you will never get it. Which takes me back to my favourite phrase on these forums, which always gets people's backs up, while also being completely true. Stop wanting it. Problem immediately solved. ID: 107119 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,076,156 RAC: 21,133	Message 107120 - Posted: 8 Oct 2022, 0:59:37 UTC And while considering these questions, I had a little look around and discovered there are 500k+ tasks awaiting validation. Wtf! I've sent an email for the appropriate server to be given a kick. Hopefully it can be dealt with before people go off for the weekend. ID: 107120 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 107125 - Posted: 8 Oct 2022, 1:50:10 UTC - in response to Message 107119. Is resource share the simple number of tasks or the time those tasks run? I don't know. Neither. Early on it was based on Credit, because Credit was meant to reflect the amount of work done (the Cobblestone) - but due to Projects paying different amounts of Credit for a given amount of work done, it didn't work too well. And then came along GPUs to make things even worse. So it was changed to use Recent Estimated Credit (which probably bases it's values on the Average Processing Rate for each application). That way Resource Share is based on work actually done- not the time spent doing it or the number of Tasks for any particular Project at any given time. A really efficient application from Project A may do 10 times the work of a really inefficient application from Project B. If Resource share is set to 100 for each Project, then Project A will use 1 core/thread, while Project B will need 10 cores/threads in order to do the same amount of work, and meet the Resource Share set. The whole point of BOINC is to allow multiple projects to share resources, regardless of the type of resource (CPU, GPU etc) or their number (ie 1 core, or 256+. 1 GPU or a dozen). Having the Resource Share setting allows people that feel one project is more important to them than another to set the amount of work done for each accordingly. It's based on the work done- not the number of cores/threads available, but the work actually done. Grant Darwin NT ID: 107125 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107128 - Posted: 8 Oct 2022, 2:07:44 UTC - in response to Message 107116. That is hardly my problem. Yes it is, because you don't know how the software you are using actually works, you are trying to get it to do things it was not designed to do. Then please start giving me recommended solutions to this mess, instead of continuing to cut apart everything I'm saying. For starters, what would you suggest I do for Resource Share settings, if I want 80% of my effort going towards LHC, and 20% to Rosetta? Will setting the resource shares to 80/20 alone fix all my problems, including guaranteeing no more massive task dumps? In that regard, let me remind you that both of those dumps occurred when I was NOT "micromanaging" things (as you so quaintly put it). That means you are going to have to explain to me exactly what you mean by "balance", so I'm asking for at least the second time, just what do you mean by that term? Does that mean achieving balance in the credit total for each of the two? If so, that will be a long time coming, even if I go back to default on everything, and change just those 2 numbers. The reason for that is simple: Rosetta tasks simply are not worth the same number of credits per CPU cycle as either ATLAS or CMS tasks (they are, however, roughly the same as Theory tasks, but those are secondary to my interests on LHC). () Even without all the "micromanaging", it will take a lot longer than mere weeks. Or, perhaps it means (I do recollect you mentioning something resembling this) balancing out the loads in the Projects tab in the boinc manager? If so, that too will take a lot longer than mere weeks. The total work done for LHC is almost 70 times as great as that for Rosetta; the ratio of average work done is even higher. I absolutely do not wish to see LHC virtually wither on the vine while Rosetta plays catch-up. () And do note that I am no "credit hawg"; I am merely interested in getting this system to the point where a) it always has something to do, and (more importantly) b) it is doing the work I want it to do, not something that is only secondary to my interests. Searching for a way to achieve this is what led me here -- and along the way, I did not find a single. But I'm all ears. If you have an ironclad way to ensure my system is always doing most of the work I want it to do while Rosetta plays catch-up, I'm more than happy to give it a try. ID: 107128 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107130 - Posted: 8 Oct 2022, 2:22:34 UTC - in response to Message 107125. Last modified: 8 Oct 2022, 2:24:24 UTC TIL -- read everything before posting anything LOL. That last missive went out before I saw this, so apologies, Grant, if I came off overly harsh once again. I just want this to be resolved in as simple a manner as possible, avoiding future massive dumps while a) giving priority to LHC, but b) ensuring the system always has something to do, without having to keep tweaking it (as I know will be necessary if all the tweaks are in app_config files). Is resource share the simple number of tasks or the time those tasks run? I don't know. Neither. Early on it was based on Credit, because Credit was meant to reflect the amount of work done (the Cobblestone) - but due to Projects paying different amounts of Credit for a given amount of work done, it didn't work too well. And then came along GPUs to make things even worse. So it was changed to use Recent Estimated Credit (which probably bases it's values on the Average Processing Rate for each application). That way Resource Share is based on work actually done- not the time spent doing it or the number of Tasks for any particular Project at any given time. A really efficient application from Project A may do 10 times the work of a really inefficient application from Project B. If Resource share is set to 100 for each Project, then Project A will use 1 core/thread, while Project B will need 10 cores/threads in order to do the same amount of work, and meet the Resource Share set. The whole point of BOINC is to allow multiple projects to share resources, regardless of the type of resource (CPU, GPU etc) or their number (ie 1 core, or 256+. 1 GPU or a dozen). Having the Resource Share setting allows people that feel one project is more important to them than another to set the amount of work done for each accordingly. It's based on the work done- not the number of cores/threads available, but the work actually done. So, am I correct here in thinking that setting LHC/Rosetta to a resource share ratio of about 80/20 will keep Rosetta from eating up all the CPU threads while it tries to play catch-up? I don't mind if it takes 3 while it does so, I just don't ever want to see it grab 9 out of the 10 threads I'm allowing boinc to have. I've also reduced "Store...days of work" and "Store additional..." to 0.1 and 0.2 respectively. Are those figures reasonable in keeping the number of tasks waiting to start down to a manageable number? Finally, thanks to everyone who's weighed in on this so far; these are all things that should have been well discussed in the user manual, but which are not (just IMO, fwiw). ID: 107130 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 107140 - Posted: 8 Oct 2022, 3:46:41 UTC - in response to Message 107130. So, am I correct here in thinking that setting LHC/Rosetta to a resource share ratio of about 80/20 will keep Rosetta from eating up all the CPU threads while it tries to play catch-up? I don't mind if it takes 3 while it does so, I just don't ever want to see it grab 9 out of the 10 threads I'm allowing boinc to have. Most likely. It all depends- if your other Projects have an outage and no work available, then Rosetta will get to use those now unused threads. When they come back online, then they will revert back to those Projects, and Rosetta will most likely end up with none for a while till things settle down again. And when Rosetta runs out of work again, those Projects will get it's unused threads- but how many Rosetta will take up when work comes back again depends on just how far behind the other Projects it is. It could get 3 or 4 (or more), but it wouldn't take long for it to drop back down again. Or it might only get 2 or 3 at most. The smaller the cache, then the less work that will initially be loaded up. and the sooner things will settle down again. I've also reduced "Store...days of work" and "Store additional..." to 0.1 and 0.2 respectively. Are those figures reasonable in keeping the number of tasks waiting to start down to a manageable number? The smaller the cache, then the less Tasks will be loaded up and waiting to run. and the sooner your Resource Share settings will be met. The Additional days setting is exactly as it is worded- days in addition to your "Store ---days of work" value. eg- If you set that to 4 days and you set Additional days to 10. The end result is 14 days of work. However, the cache won't be refilled to the 14 day level until it has fallen down to 4 days again. If you set your cache to 4 days, and additional days to 0.01 then as each Task is completed, a new one will be downloaded- keeping the cache at 4 days (maybe an hour more, maybe an hour less, but overall around 4 days). So Additional days is best set to 0.01- that way completed work gets reported as it's finished, and BOINC doesn't wait till things are almost empty (or are empty) before getting more work. And with the cache itself set to 0.1 days, there will always be a Task ready to run once the present one has finished, but you won't have several others queued up waiting to run. Grant Darwin NT ID: 107140 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2538 Credit: 47,076,156 RAC: 21,133	Message 107144 - Posted: 8 Oct 2022, 9:29:38 UTC - in response to Message 107140. So, am I correct here in thinking that setting LHC/Rosetta to a resource share ratio of about 80/20 will keep Rosetta from eating up all the CPU threads while it tries to play catch-up? I don't mind if it takes 3 while it does so, I just don't ever want to see it grab 9 out of the 10 threads I'm allowing boinc to have. Most likely. Just to throw an example in here, my resource share is set 2900 Rosetta, 100 WCG (because I couldn't originally find where to change WCG's resource share). I don't mind WCG getting an occasional turn but, essentially, I only really want to run Rosetta, especially when debt has built up. I don't restrict cores by project and I call for a buffer of 0.1 days and an additional 0.5 days. When tasks are available for both projects, I get and run 100% Rosetta tasks and every so often WCG tasks come down at 1 per thread. 8 at a time with 8 thread machines, 16 at a time with 16 thread machines. Admittedly this may be because that number fit into 0.6 days of tasks. If I restricted task-downloading to the levels you have it may be I'd download fewer at a time - I don't know how that works. If I restricted cores per project too, which I don't, it may be these 8 or 16 tasks only run 2 at a time. When they start running may be determined by their deadline, and then the later ones fail to meet it, or in order of downloading, which may or not be ok for a 16-thread machine where they'd require 8 goes of 2-at-a-time rather than 16 in parallel. Again, I don't know tbh. So to achieve that mythical 2 of one type, the rest of the preferred type it would seem to be pretty essential to pare the offline buffer to the numbers you suggest. I don't think this is intuitive, so again I don't think it's something to blame the user for not knowing in advance. But it is a quirk of how Boinc seems to deal with lots of interacting settings that work themselves out in the particular way they've decided and not how people assume it would. Unless there's some shortcoming of the host machine, wanting Boinc to allow only 2 of one project to run at a time is quite a complicated thing to ask in a way Boinc understands and to run successfully it would be a lot easier to change what you think running 20% of the time means. A better way of thinking about it is to run 100% of the tasks of the lower priority project 20% of the time and the preferred project 100% of the tasks 80% of the time, which looks very different but achieves the same end - just over a slightly longer timescale. And to achieve that quicker it would seem to make sense not to restrict cores for a project. It just piles on extra complications. So, stop wanting what you say you want, thinking it'll easily achieve something it won't, and want something different that achieves the same end quicker and easier and with less moaning about it along the way. ID: 107144 · Rating: 0 · rate: / Reply Quote

Bryn Mawr Send message Joined: 26 Dec 18 Posts: 440 Credit: 15,189,162 RAC: 9,386	Message 107151 - Posted: 8 Oct 2022, 14:23:16 UTC @hadron A setting you might want to change is :- <rec_half_life_days>X</rec_half_life_days> A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs. in cc_config.xml Setting this to 1 will reduce the time that a project will try to play catch-up when it’s been short on work so, in your case, if Rosetta has fallen behind its project share it will still grab 3 or 4 threads but only for a few hours rather than a few days. ID: 107151 · Rating: 0 · rate: / Reply Quote

hadron Send message Joined: 4 Sep 22 Posts: 70 Credit: 2,080,995 RAC: 1,149	Message 107174 - Posted: 8 Oct 2022, 23:10:59 UTC - in response to Message 107151. @hadron A setting you might want to change is :- <rec_half_life_days>X</rec_half_life_days> A project's scheduling priority is determined by its estimated credit in the last X days. Default is 10; set it larger if you run long high-priority jobs. in cc_config.xml Setting this to 1 will reduce the time that a project will try to play catch-up when it’s been short on work so, in your case, if Rosetta has fallen behind its project share it will still grab 3 or 4 threads but only for a few hours rather than a few days. I assume that "high priority" is determined by the resource share settings, which I've now set for LHC/Rosetta to 400/100 (I've left the app_config.xml files in place for now). Your post therefore presents me with a small problem -- the LHC tasks I'm primarily interested in are ATLAS and CMS, because they are looking for new physics -- whereas Theory is testing particle collisions within the confines of the Standard Model (IMO, that is pretty much discredited now; it is a good basis, but really quite incomplete). Both ATLAS and CMS are long tasks; the former can run (if given only a single thread) for more than 24 hours, while CMS generally takes 12 to 15 hours. Thus it would seem that I should increase this setting, whereas you suggest reducing it so Rosetta will give back any extra threads its taken much faster. There seems to be a contradiction here, yes? Do you have any further thoughts based on this information? I can reduce ATLAS run times to about 13 to 15 hours by forcing 2 threads per task (single line in app_config.xml, or a setting in my preferences on LHC), but beyond that, I am at a loss. For Sid and Grant: I'm slowly clearing away the backlog of CMS tasks -- 8 running/5 waiting now, but given CMS run times, this is going to take awhile. When that is all cleared up, I'm going to delete both app_config.xml files and try running everything based on project server settings and local global preferences. If that works out giving me what I want to see (especially from LHC -- mostly ATLAS and CMS, few Theory), then I'll be satisfied -- but it isn't clear right now that I can achieve that balance on LHC without using an app_config file; time will tell, I suppose. I'm probably going to go silent for awhile, unless there are any further suggestions made, and I need some clarification. Otherwise, I'll be back when I have some definite experience with the changed settings. Thanks again for all the input, it's been very helpful and explains a lot of detail that the boinc manual simply does not touch. ID: 107174 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 107176 - Posted: 8 Oct 2022, 23:38:40 UTC - in response to Message 107174. Last modified: 8 Oct 2022, 23:45:06 UTC I assume that "high priority" is determined by the resource share settings, Partially. What determines whether or not something becomes high priority (in order of significance) are changing any project or application settings, the deadlines for each Task (days, weeks or months) & the Run Times for each different application, the size of your cache & the number of Projects you do. The changing of setting has the biggest impact. BOINC was trying to meet the previous settings, now it has to work out what needs to be done to meet the new settings & then do it. That could result in Tasks becoming high priority when under the previous settings it wasn't necessary. A Project might only have a Resource Share of 1, but if it has several days worth of work, and the deadlines are like Rosetta's (only 3 days), then BOINC has to do those before it can do any other Tasks even if they have a much higher project priority otherwise it will miss the deadline. That's why with more than one Project it's best to have no cache- BOINC can do things the way you want much sooner (within a week, and not over several months). It's also why it's best to give BOINC as many cores & threads as possible, and "Use at most xxx% of CPU time" should be set to 100%, and "Suspend when computer is in use" should not be selected and "Suspend when non-BOINC CPU usage is above --- %" should be left blank. BOINC applications are generally Low priority- pretty much everything else has a higher priority so if something else needs CPU time, the BOINC Task is the one that comes off second best. If you regularly run other CPU intensive programmes, then reserving a core/thread or two for non BOINC use is necessary. If you see a noticeable difference between CPU Time and Run Time for your Tasks, then there's something else sucking up CPU resources. Decide if it's necessary or not & change things accordingly. And looking at your LHC Tasks, that is an issue. It's taking you 13hrs 45 min to do 12hrs 15min worth of work. Same with Rosetta- it's taking you 9 hours to do 8 hours worth of work. So that's yet another thing BOINC has to take in to account when it's trying to work out what to do & when for which project & application. Depending on the application, GPU Tasks may require a CPU thread to support each running GPU Task. If they have to share time with a CPU Task, then processing times increase significantly. Or you have some other application on your system sucking up CPU time. Grant Darwin NT ID: 107176 · Rating: 0 · rate: / Reply Quote