Message boards : Number crunching : There's a max WU of 8 with Virtualbox
Author | Message |
---|---|
Dougga Send message Joined: 27 Nov 06 Posts: 28 Credit: 5,248,050 RAC: 0 |
I have the new intel Core i9-12900k which has 25 threads and 16 cores. Boinc/Virtualbox will only run 8 work units for some reason. The Boinc UI doesn't seem to have a max, but I'm still looking. Can someone help me out? |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 5,748,861 RAC: 0 |
It’s a lack of RAM. See Falconet’s post (Feb 11 at 5:16pm) on the World Community Grid forum here https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,44037_offset,0 |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1633 Credit: 16,775,951 RAC: 13,112 |
It’s a lack of RAM.Or Disk space even if you have enough RAM. For Python Tasks roughly 3.5GB of RAM per Task is required (even though much less is actually used), and 7.5GB of disk space is needed per Task. For Rosetta 4.20 Tasks, allowing 1.3GB of RAM per Task means you won't run in to lack of memory issues for that work type. Check your Event log to see what the messages are for what Rosetta needs, and you still need to run the BOINC benchmarks. Grant Darwin NT |
computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0 |
... you still need to run the BOINC benchmarks. Not necessarily if the server runs a recent BOINC version (Rosetta does). Then the benchmark results are just taken to initialize corresponding "speed" fields in the app_version record for that client. Once initialized the server recalculates the "speed" values based on the reported runtimes. Those updated values are sent back to the client as <flops>. Example (all from the same client): Benchmark p_fpops: 7271969760.658784 Rosetta 4.20 flops: 2777777850.436758 Rosetta python flops: 3409656081.627964 If a client has never sent a benchmark result 1000000000.000000 (p_fpops) is used as default. The more p_fpops and flops differ the more credits and runtime estimation jumps up/down when a new app_version is sent out or the flops are reset server side. It also takes longer until the numbers return to values the volunteer is familiar with. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 10 |
... you still need to run the BOINC benchmarks. I'm running 9 Python and 6 x 4.2 right now. Forgot to take BOINC out of suspend mode before I went to work. So Rosie is complaining. |
computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0 |
I'm running 9 Python and 6 x 4.2 right now. Forgot to take BOINC out of suspend mode before I went to work. So Rosie is complaining. Did you read my post? It appears that you didn't as I don't see any relationship. You nearly always make full copies of all comments you find (including replies to ... including replies to ...) and waste the forum with it. It would be easier for everybody reading it if you would focus on the small pieces you want to refer to. Everything else can be read in the original posts close to the replies which can be found via an existing link. |
Dougga Send message Joined: 27 Nov 06 Posts: 28 Credit: 5,248,050 RAC: 0 |
It’s a lack of RAM. There are no complaints in the event logs regarding ram. Looking at Resource Monitor is claims... Memory: Hardware reserved: 259 MB In Use: 18756 MB Modified: 148 MB Standby: 13608 MB <----- this is oddly high Free: 1 MB It appears to me the VM's are reserving an enormous amount of memory they are not using which is the problem. An additional 32GB ram is back-ordered so it will be added shortly. Thanks. |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 10 |
Well then why not reply to the reply and skip the quote? Geeess...you need some whine to go with that? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2074 Credit: 40,613,760 RAC: 5,140 |
It’s a lack of RAM.Or Disk space even if you have enough RAM. I have a weird thing going on with Disk space, over and above the understandable RAM limitations I have Based on Disk Use no more than xx disk space - unselected Leave at least 1GB free - selected Use no more than xx% of total - unselected RAM When computer is in use, use at most 80% When computer is not in use, use at most 90% Leave non-GPU tasks in memory while suspended - selected Page/swap file: use at most 75% Event log shows my preferences as 16/02/2022 16:59:22 | | Reading preferences override file I have 7 python tasks running and 3 python tasks "waiting to run" Event log shows: 16/02/2022 22:39:26 | Rosetta@home | Message from server: rosetta python projects needs 2059.75MB more disk space. You currently have 17013.73 MB available and it needs 19073.49 MB. I get the RAM limitations, though there's no complaint about that in the event log, but how come there's only 17Gb disk space left out of 826Gb to download another 19Gb task? Then, while I was typing this, 8 Rosetta 4.20 tasks got downloaded without complaint and are running Am I supposed to understand this, or is this just how it goes. I doubt I'd have the RAM to run any more tasks anyway, just have a buffer Edit: Disk tab shows: Used by Boinc 84.96Gb Free, available to Boinc: 748.68Gb Free, not available to Boinc: 1.00Gb Used by other programs: 96.27Gb |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1633 Credit: 16,775,951 RAC: 13,112 |
So what messages are in the Event log when it requests more work?It’s a lack of RAM. If BOINC doesn't have enough RAM or disk space to start more tasks, it generally mentions it in the Event log, as many people have posted here many times ever since Python work was released. Keep in mind it doesn't matter how much RAM your system has, if you don't let BOINC make use of it. Computing preferences, Memory, "When computer is in use, use at most xx %" and "When computer is not in use, use at most xx %", generally set both to at least 95% so BOINC can make use of it. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1633 Credit: 16,775,951 RAC: 13,112 |
I have a weird thing going on with Disk space, over and above the understandable RAM limitations I haveI think it was .clair. who was having similar issues. I think it was a case of them using local preferences, and with those unselected the web based preferences were used. To get around it, instead of leaving those other values unselected, they put values in there that would give BOINC more than they would ever need. Both locally & in their web account settings. Then it stopped complaining about disk space & downloaded the extra Tasks. Grant Darwin NT |
MJH333 Send message Joined: 29 Jan 21 Posts: 18 Credit: 5,748,861 RAC: 0 |
It’s a lack of RAM. Dougga, I have two 4C/4T laptops running Pythons. With 8GB of RAM, they would run only 2 tasks at a time. I increased the memory on both to 16GB, and they both now run 4 Pythons. The fact that your machine has 32GB of RAM and can run only 8 Pythons is what led me to think that RAM is the issue. But as Grant (SSSF) has pointed out, disk space can also be a problem with the Pythons. I have to confess that I couldn’t remember what the Event Logs said about this issue when I only had 8GB of RAM. So I conducted a little experiment this morning, taking 8GB out of one of the laptops. Doing this caused 2 of the 4 Pythons to stop running, with the message “Waiting for memory” showing in the Status section of Boinc Manager for the tasks. Aborting one and downloading another caused the Status message for the new one to read “Ready to start” instead of “Waiting for memory”. So the message for tasks which haven’t yet started because of lack of RAM seems to be simply “Ready to start”. I also looked at the Event Log. It did not mention the fact that tasks weren’t running because of a lack of memory. I think I have the default options for the Event Log. Changing those options by using Options>Event Log options in Boinc Manager to add in mem_usage_debug resulted in the Event Log recording that those 2 Pythons “can’t run, too big”. I thought I would record this, in case you or others find it helpful. Cheers, Mark |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Sid Celery I have a weird thing going on with Disk space, over and above the understandable RAM limitations I have I finaly found the combination that works , so try this from my thread on the problem https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14903&postid=104879 Use no more than - 500 GB . . [the total size of my disk its on , this is so I can run 45 work units together , [I have now seen boinc using 352GB of disk and 80GB of RAM] don't worry about setting this BIG. Leave at least ## GB free . . [untick this box not needed] Use no more than ## % of total . . [untick this box not needed] |
Dougga Send message Joined: 27 Nov 06 Posts: 28 Credit: 5,248,050 RAC: 0 |
Well it appears at least for now the VirtualBox work units are gone and I have 24 wu running. That's a first. After further investigation the computer is using different cores differently. Parked:(8) 1,3,5,7,9,11,13,15 Idle or mild use:(8) 0.2,4,6,8,10,12,14 Maxed Out:(8) 16-23 Someone mentioned Windows 11 might make better use of the cores and the new Intel CPUs have multiple types of cores. More research... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2074 Credit: 40,613,760 RAC: 5,140 |
Sid CeleryI have a weird thing going on with Disk space, over and above the understandable RAM limitations I have I just spotted that in your other thread. Well worth a try once I get back to that PC on Sunday. It kind of indicates a problem with Boinc working with Python tasks. I've not come across it before Thanks for the pointer |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2074 Credit: 40,613,760 RAC: 5,140 |
Sid CeleryI have a weird thing going on with Disk space, over and above the understandable RAM limitations I have I've set the disk space to 500Gb rather than leave it unlimited (826Gb on my system) and it worked straight off. 30 more Python tasks came down straight away - added to a number that was fewer than the 16 threads I have. Not sure why, but you definitely hit on something |
Greg_BE Send message Joined: 30 May 06 Posts: 5690 Credit: 5,859,226 RAC: 10 |
This bug (disk space errors when running 15 or so python) has been submitted to the BOINC guys on Github. One guy said he will dig into. All settings are ok, disk space setup in BOINC was ok. (leave 2GB free no restriction) Drive has more then enough capacity (500GB dedicated) Now there is a difference between my drive and yours (Sid). You have a way larger drive and your bringing it down to my drive level of 500. I have a 500 (465 automatic allocation when formatting) and I said keep 2 gigs free, so 463 gigs and I had the problem of disk errors here and in SiDock. 15 SiDock or 15 Pythons both created disk space errors despite having more than enough space. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2074 Credit: 40,613,760 RAC: 5,140 |
This bug (disk space errors when running 15 or so python) has been submitted to the BOINC guys on Github. Not sure if this is of any significance at all, but I can run 10 Python tasks at a time on my 8C/16T machine (90% RAM of 32Gb in use, 95% RAM not in use & 500Gb disk allocated) A whole load of WCG tasks came down unexpectedly (97) so I suspended 2 pythons, leaving 8, and the other 8 cores of my PC started running WCG tasks. So I'm going to tweak the RAM up a touch more and add another 100Gb disk space and see what happens, if anything |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2074 Credit: 40,613,760 RAC: 5,140 |
This bug (disk space errors when running 15 or so python) has been submitted to the BOINC guys on Github. Increased RAM to 95% in use 95% not in use & 600Gb disk space and could run 10 pythons only or 9 pythons and 6 WCG. Increased RAM to 97% & 97% & disk at 600Gb and could run 11 pythons only or 10 pythons and 6 WCG I don't think increasing the disk space made any difference tbh - just the RAM as I stepped it up 1% at a time |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
My theory . . . . On my 16cpu system I can only run max 11 pythons together because it has only 32GB memory , 100% available to boinc. The `properties` of a python want 2.79GB memory , so , 2.79 x 11 = 30.69GB a bit for system use . and its ram is full used with no other tasks running it uses about 20<21GB of ram the rest boinc calculates has been eaten by pythons , {greedy objects} But , If any R4.2 tasks appear [or WCG in your case] they will use the other cpu`s and the ram that the pythons are hogging , but not actualy using. Ticking `cpu_sched_debug` in event log gets a lot of output I wont inflict the forum with it all here are the last few lines that look most relevant :- 20/02/2022 17:51:32 | Rosetta@home | [cpu_sched_debug] enforce: task aagb-AIB_pp-NMPHE-ACBC13T-mACPenC12C_12_2594285_2_0 can't run, too big 2861.02MB > 1294.93MB 20/02/2022 17:51:32 | Rosetta@home | [cpu_sched_debug] enforce: task aaas-AGLY_pp-NMPHE-mVAL-mSUGA_0_2406724_2_0 can't run, too big 2861.02MB > 1294.93MB 20/02/2022 17:51:32 | Rosetta@home | [cpu_sched_debug] enforce: task aagb-NMPHE_pp-mTIQ-LARE-mB3PHG_pp_5_2674514_2_0 can't run, too big 2861.02MB > 1294.93MB 20/02/2022 17:51:32 | Rosetta@home | [cpu_sched_debug] enforce: task aaas-HPR_pp-SAR-AGLY-mSUGA_pp_12_2432415_2_0 can't run, too big 2861.02MB > 1294.93MB 20/02/2022 17:51:32 | | [cpu_sched_debug] using 11.00 out of 15 CPUs 20/02/2022 17:51:32 | | [cpu_sched_debug] enforce_run_list: end Its the "can`t run to big" I had seen this a while ago when the "disk space messages from server" were doing my head in. |
Message boards :
Number crunching :
There's a max WU of 8 with Virtualbox
©2024 University of Washington
https://www.bakerlab.org