Message boards : Number crunching : Not getting any python work
Previous · 1 . . . 6 · 7 · 8 · 9
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Good find - I wouldn't have seen that for a long time! Bryan - for once just shut the *%*$ up. I am saying that yes its nice that I finally got Python, but with a few issues. Those issues are listed. What errors cause a blacklist? Isn't that something we should know? The ability to control how many tasks we want to run? Isn't that something we should have control over? Your just a bull seeing a red cape, go hide in your dark corner and put your blinders on. Your just a young one when it comes to crunching this project. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
The ability to control how many tasks we want to run? Isn't that something we should have control over?That is already there for the overall BOINC settings with the "Use at most xxx% of the CPUs" in order to reserve cores/threads for non-BOINC work on systems that have heavy non-BOINC workloads. But as as to how many cores a particular project can use, the simple answer is no- you should not be able to limit cores/threads on a per project or per application basis. It is difficult enough trying to resolve pople's issues just with the present BOINC & per project options available without further complication things. More complicated answer- there are edge cases where system resources may make it necessary to further limit the number of cores/threads that a particular project can use- and that option is already available using max_concurrent. Unfortunately it's implementation is broken & the BOINC devs need to sort it out so it can be used without systems downloading Tasks until the server has none left. But to have that as a generally accessible option- no. It isn't necessary or even desirable, Your issue with this is purely because, for whatever reason no matter how many times i try to explain things to you, you just don't understand the whole point of Resource share. It's not about the number of cores you have, or the number of threads you have, or the number of CPUs you have, or the number of GPUs you have, or the number of Compute Units your GPUs might have. It's about the amount of work done- that's it. End of story. How many cores/threads being used by any particular project at any particular time is completely irrelevant. There is no need to limit the cores threads/available for a given project. You set your Resource share value for each of your projects- keeping in mind it's a ratio not a percentage -and then BOINC goes about doing work for each project in accordance with those values. Once again- how many cores/threads being used by any particular project at any particular time is completely irrelevant. It's about the amount of work done for each project- not how many instances of a particular project are running at a particular time. So- no, there should not be any ability to control the number of Tasks running on a per application or per project level. You give BOINC access to as many cores/threads as your system can spare it, and let it do it's job. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,114,842 RAC: 4,200 |
Good find - I wouldn't have seen that for a long time! If you show me where you say “thank you”, or even “that’s nice” then I’ll happily apologise but until then I see no reason to “shut the *%*$ up” or go away and hide - seniority does not excuse ignorance and rudeness. |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,227,479 RAC: 753 |
Greg, Check the logs. You've returned 2 valid Pythons and 1 errored out due to an "upload error": </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>boinc_cages_IL_2727241_83772_0_r1309889448_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Greg, Ok..so far so good. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Good find - I wouldn't have seen that for a long time! I will give you a thank you for the link. But your attack is what got you these remarks. Seniority has its advantages. I may not have the RAC you do, but my system is more spread out with a bunch of projects. I have been on here since this project almost started. Back when it was dial up, back when this project was just getting started and was doing stuff in the papers to get interest in it. Back when Dr. B used to post stuff, back when we had technical students monitoring this forum. Back when ADMIN was DEK and he used to keep track of stuff here. Back when we started posting about an issue someone would pay attention. Back when they tried to fix a problem rather than black list systems that kicked back errors. There are errors related to the setup of a persons Vbox and there are errors on which version works best with what projects. There are errors due to programming faults by the person who wrote the code. So this is what I am driving it. What "errors" is he referring to specifically. If he puts out buggy tasks and they kick back errors, is he going to black list every system that gets those buggy tasks? What if the version of Vbox creates an issue? Is he going to blacklist you for that? I have seen these kinds of errors here and on other projects. That is what I am going on about. Which you decided to bark at me about. Now since We have established the pissing distance and measuring of certain body parts, lets just go back to normal. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
The ability to control how many tasks we want to run? Isn't that something we should have control over?That is already there for the overall BOINC settings with the "Use at most xxx% of the CPUs" in order to reserve cores/threads for non-BOINC work on systems that have heavy non-BOINC workloads. Resource share per project does not define how the system is used. Please have a look a Qchem (It allows definition of cores and total tasks), ATLAS (same thing), WCG is a bit different, it gives you a percent of total processors and a choice of which projects you want to do. Einstein lets you define how many processors and how much of that processors time you want to give. So if RAH wants to go play with the big guys using Vbox, etc. then they should let you define how your system does work. However it seems that scheduler or BOINC is learning how my system is allocated, so it is running only 2 tasks at a time. But here is what is puzzling, is the difference between reported used memory by the task manager of windows and the calculated memory usage offered by Boinc Tasks. By Boinc Tasks I am using 22.4 of 24GB but when I look at Windows task manager it says I am only using 36% of the total memory, So why the big difference? 18.88 is used to run VM tasks and the rest of the memory is being used by LHC,WCG and Prime Grid. Off Boinc is FAH is using a rounded figure of 370MB and my browser is using 344 MB with a bunch of tabs open. So I do not understand where the difference is. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
Resource share per project does not define how the system is used.And this statement yet again shows that you just don't understand in the slightest the entire concept of what BOINC is about- sharing resources between multiple projects according to the users choice of the relative importance between the projects. WCG is a bit different, it gives you a percent of total processors and a choice of which projects you want to do.Because WCG isn't a project; it is a group of projects, that appear as a single project under BOINC. So if RAH wants to go play with the big guys using Vbox, etc. then they should let you define how your system does work.Some projects do have Tasks than can be multi-threaded, or require more than a single physical CPU core for single Task. Hence they need the option to support this need.. But just because someone else does something, doesn't mean others should do the same, particularly if what they are doing is adding complexity without being of benefit. However it seems that scheduler or BOINC is learning how my system is allocated, so it is running only 2 tasks at a time.You only have enough RAM to run two Python Tasks at a time, so that is all it can run- even if it had to run more in order to meet your Resource share settings it physically can't do it. Grant Darwin NT |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Resource share per project does not define how the system is used.And this statement yet again shows that you just don't understand in the slightest the entire concept of what BOINC is about- sharing resources between multiple projects according to the users choice of the relative importance between the projects. Good enough. System is where it needs to be for my liking now. Since we are arguing over system share....can you point me to a page where that is explained in depth? There is also a person in the problems thread that Jim was talking about resource share to. So if I cut RAH to 50 and leave the others at 100..what does this do? My understanding it would push RAH to a lower level in the order of tasks processing. That is not what I want to do. I want it to be of equal importance but I don't want it taking over my system. Though with just two pythons eating up a lot of memory other projects are suffering. Which makes me wonder if I need to get more memory. Two sticks are pretty old. Current scene is 3 pythons 2 x 4.2 3 x SiDock and GPU Einstein and Prime. FAH runs in the background. For BOINC that is just 10 out of 15 cores. I did a calculation last night that says I am using all but 2 gigs of memory. So This makes me think I need to increase the memory in order to keep everything busy. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
Since we are arguing over system share....can you point me to a page where that is explained in depth?There is no in depth explanation of it because it's such a straight forward thing. It is about the work done. That's all it is about. The amount of work done for a project. Nothing more, nothing less. That is all. The amount of work done for a project. Once again for the thousandth time- it's not about the number of cores, it's not about of threads, it's not about the number of CPUs, it's not about the number of GPUs, it's not about the number of GPU compute units, it's not about the amount of time any particular project gets, it's not about how many Tasks of what particular project are running at any particular time. It's just about the work done. That's it. It's only about the work done. There is nothing else to explain. From BOINC itself- Resource share The amount of computing resources (CPU time, disk space) allocated to a project is proportional to this number. The default is 100. Note: At World Community Grid this option is titled "Project Weight". Note: this is not a percentage. If a computer has 2 projects added, each with resource share 100, each project will get half the resources. If a project is given a resource share of 0 it will not receive any resources unless other projects are unable to provide tasks. Using the value 0 is known as 'setting a backup project': you are advised always to leave at least one project with a non-zero resource share, otherwise the backup project system cannot function normally. So if I cut RAH to 50 and leave the others at 100..what does this do?See above- it is about the work done, it has nothing to do with how many of any particular thing is running at any particular time. Because of differences in run time, because of differences in deadlines, because of differences in application optimisation, because of the differences in the data being worked on even for a given application, then some projects will need to use more cores for more time than other projects- in order to meet your Resource share settings It's this very simple, straight forward basic fact that i have explained to you over & over again till i'm blue in the face and you either choose to ignore it or for some reason i can't fathom you just can't grasp the concept of. Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,114,842 RAC: 4,200 |
Resource share per project does not define how the system is used.And this statement yet again shows that you just don't understand in the slightest the entire concept of what BOINC is about- sharing resources between multiple projects according to the users choice of the relative importance between the projects. I’m surprised that you’re getting 3 RPP tasks running alongside the rest in 24gb. Assuming that FAH is fixed to use the other 5 cores, that Einstein and Prime are GPU only and share the 16th core with the os and that SI-Dock and RAH have the same resource share, in the current situation where 4.20 tasks are few and far between you probably need to double your memory before the machine will run all cores. If you drop the resource share of RAH to 50 and leave all of the others at 100 you will see the RAC of RAH drop asymptotically over the next couple of weeks to the point where its RAC is half that of SI-Dock - it should have little effect on the other projects. Remember, however, that this is not a hard and fast proportion and that task availability or a stuck task on you machine will perturb it requiring another asymptotic adjustment. The resource share is a request to the system, not a rule. |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,227,479 RAC: 753 |
The queue has dropped to zero. Server Status says there are just over 2,000 pythons ready to send. Wonder if they are lowering the memory requirements. Admin did say that in order to so, all the pythons would have to be taken away from the queue so that they could be based on the new VB image with less memory requirements. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
The queue has dropped to zero. Server Status says there are just over 2,000 pythons ready to send. Still 7629.39 physical and 100.xx virtual. The physical has been consistent since I got started on them. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Resource share per project does not define how the system is used.And this statement yet again shows that you just don't understand in the slightest the entire concept of what BOINC is about- sharing resources between multiple projects according to the users choice of the relative importance between the projects. -------------- 3 Python is only 22GB and the others are 600 MB +/- combined. FAH is GPU, but your right, I could put the other cores to work on it. Why not...they are idle. FAH is a combined 593 MB (Note: added 7 cores at full power and the GPU's at full power, Memory is 49% as this change only added 107-8MB to the load. I'll leave it ( RAH RS) at 50 and see what happens into the work week. I already know scripts don't work, we went through that already, so there is no other options. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
The queue has dropped to zero. Server Status says there are just over 2,000 pythons ready to send. I haven't been around much the last week or so and missed what was being said, after not having got round to reporting what was being talked about the previous week. Sorry. I caught up the other day and saw Admin had already popped in and commented on the threads anyway. I sent an email in anyway, re-emphasising what's been said here and what was promised and pointed to the Rosetta graph on the munin site to show the massive drop off in tasks available and being reported back over the last 3 weeks. It was within the next 12hrs that the queue dropped to zero and I came to the same conclusion as you, so hopefully something comes of it soon. Far from the end of our issues as I'd like to see a return of the Rosetta 4.20 tasks for my own benefit. In the meantime, WCG is getting a significant bump on all my hosts - unfortunately. I'm thinking of cutting down my cache so when Rosetta comes back on stream I don't have so many 2nd preference tasks to clear down and I can grab some, then increase my cache again to run more of my 1st preference project |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,227,479 RAC: 753 |
Yes, let's hope that it is indeed because of the memory issue. Shouldn't take long to find out. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Yes, let's hope that it is indeed because of the memory issue. It can only be for the memory reason that such a large number were all taken offline. You guys will know how successful it's been as soon as they come back, I've no doubt. On the plus side, there is a response to the struggles reported here, even if not immediately |
Message boards :
Number crunching :
Not getting any python work
©2024 University of Washington
https://www.bakerlab.org