Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 129 · 130 · 131 · 132 · 133 · 134 · 135 . . . 309 · Next
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I wish they'd use KVM/QEMU instead of Virtualbox for Linux. I am sure it would work better, since it can't work worse. It is getting practically impossible to run the pythons. The first problem is "Vm job unmanageable" suspensions, which occur on all of my machines no matter what steps I take (mainly limiting cores) to prevent it. You need to either wait a long time, or reboot to fix it. But now the problem is that about half the pythons won't run at all. They get stuck at less than 1% CPU utilization, and I have to abort them. I am moving away from interventionist projects on my machines, and the pythons are the next ones to go. |
mmstick Send message Joined: 4 Dec 12 Posts: 8 Credit: 606,792 RAC: 0 |
I do constantly get the issue of having to abort Python units at 99.996% completion, even on my Ryzen 5700g desktop with 64 GB RAM, which seems to be good enough for running 8 python units simultaneously on each physical core. Have tried to limit the number of Python work units to 4 just in case so I can run 12 normal tasks in addition to that, but apparently using an app_config.xml to define max-concurrent work units causes BOINC to repeatedly ask for 12 work units every 30 seconds, so had to abort that attempt. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I do constantly get the issue of having to abort Python units at 99.996% completion, even on my Ryzen 5700g desktop with 64 GB RAM, which seems to be good enough for running 8 python units simultaneously on each physical core. It isn't a problem of memory, and you don't need to go to 99%. If in the first five minutes they are less than 1% CPU utilization, you can abort them. I use BoincTasks to monitor that. |
doug Send message Joined: 28 Mar 20 Posts: 8 Credit: 1,638,060 RAC: 1,418 |
Thanks for the reply. I have not done that, nor have I ever had to do it in the past. I'm running Win10 with all the latest updates. In Task Manager, on the second (Performance) tab, at the bottom with all the CPU info, it says "Virtualization: Enabled". Does that address what you are asking about? If not, do you know where in Windows I can find the info you are asking for? Thanks. Doug [/img] |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 2,018 |
Deleted. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Maybe try what I do for LHC ATLAS which is a very picky project and has a hard time running on single cores and such. I have in the past wrote an app_config that forced it to run on just 4 cores and 1 task at a time. Now I can set that in the web preferences of this project. So maybe you can try that for Python. But being it falls under "Rosetta" it will apply to all tasks from RAH. Another stupid thing from this project and you can not set this in the web preferences here either. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I tried to get RPP to run multithreaded with this app config :- <app_config> <app> <name>rosetta_python_projects</name> </app> <app_version> <app_name>rosetta_python_projects</app_name> <plan_class>vbox64</plan_class> <avg_ncpus>5</avg_ncpus> </app_version> </app_config> but even though it shows on boinc manager as ` Running(5cpus) ` each RPP task runs 25 threads total, so unless the data they are crunching is very linier. it don't actualy do it when looking at cpu graphs, any ideas as to what else could be in an app config to force it to use multi thread or could it be hard coded in the VM not to?? or am I wasting my time trying :( I changed it around from the one I use at cosmology@home <app_config> <app> <name>camb_boinc2docker</name> <max_concurrent>2</max_concurrent> </app> <app_version> <app_name>camb_boinc2docker</app_name> <plan_class>vbox64_mt</plan_class> <avg_ncpus>7</avg_ncpus> </app_version> </app_config> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
I tried to get RPP to run multithreaded with this app config :- [snip] It's rare that you can make a program run multithreaded unless it's written to know how to do so. Changing the app config file isn't enough if that's all you do. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
<name>rosetta_python_projects</name> That as far as I know is an internal naming of the type of task. As far as I know all tasks fall under "rosetta" I have not found a way to isloate python tasks. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I decided to have a go at it again, give the computer a full reboot [not out the door] even if it is / was a case of knowing just enuf to make a big mess of it I did get a some xml errors noted in event log, I just keep bashing away at it till something happens :) well it did some thing . . . . . I know it sounds like something from a Frankenstine video because one of the `vboxheadless.exe` instances in win7 resource monitor is using 22% of cpu on 16 core cpu, [one cpu is only 6.25%] could someone be mad enuf to try it @home and see what happens only new tasks downloaded AFTER the app-config is in place will get the new settings config |
mmstick Send message Joined: 4 Dec 12 Posts: 8 Credit: 606,792 RAC: 0 |
Using an app_config to set the max-concurrent value will cause your system to endlessly request work until you've fully depleted the server of work units. I don't recommend doing so until this issue is fixed: https://github.com/BOINC/boinc/issues/4322 |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Using an app_config to set the max-concurrent value will cause your system to endlessly request work until you've fully depleted the server of work units. I don't recommend doing so until this issue is fixed: https://github.com/BOINC/boinc/issues/4322 I have not run cosmo@home for several months , endless workfetch was stopped by them having a limit serverside on the number of workunits anyone was allowed to have I have been reading the threads here on R@H with interest about that work fetch problem |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have been reading the threads here on R@H with interest about that work fetch problem I first ran into it several years ago on WCG. More recently, we had a discussion of it on LHC. https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5720&postid=45308#45308 Also: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5726&postid=45384#45384 It has been reported to BOINC. https://github.com/BOINC/boinc/issues/4322 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,534,176 RAC: 10,708 |
After getting a nudge <cough> from Grant I asked about all this. The reply as follows There is no server issue failing to create Rosetta 4.20 tasks. We have run out. All those 2.3 million queued tasks on the front page really are all Python tasks. In their words "a huge queue" However, regarding the shortage and re-supply of Rosetta 4.20 tasks: "This will be temporary since there will be many more protein protein interaction Rosetta design jobs loaded into the queue" That comes across to me like they're preparing them now and it won't be too much longer before we see them, though no actual ETA, nor any clue how many "many" is. So, don't panic, more work will arrive before too long. Famous last words... Also, while people do disappear around the bigger holiday periods like Christmas and New Year, I've often had replies on Saturdays and Sundays, so things do happen at weekends. Just that I think it's unreasonable to expect that will always be the case. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
After getting a nudge <cough> from Grant I asked about all this. The reply as follows If there are so many Python tasks, then why can't I get them? I've monkeyed around with all the parameters in BOINC and I get nothing. Right now due to the last monkeying around the queue to each project is all messed up so I am playing catch up. About the only thing I have not done is remove the project from BOINC, do a system clean and reinstall RAH on BOINC. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1725 Credit: 18,399,907 RAC: 19,807 |
If there are so many Python tasks, then why can't I get them?One last wild guess- Clean up any left over VMs that are gumming things up. To completely delete any virtual machine from VirtualBox on Mac, Windows, or Linux, simply do the following:Remove an OS and Delete a Virtual Machine in VirtualBox Grant Darwin NT |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 400 Credit: 12,294,748 RAC: 6,222 |
After getting a nudge <cough> from Grant I asked about all this. The reply as follows Many thanks for the update. At the rate it’s going it will take years to get through a couple of million Python tasks so the best of luck with that, I’ll wait for the normal Rosetta tasks. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
have a look at VM cpu count, my frankenstine app_config for RPP duz do `something` https://boinc.bakerlab.org/rosetta/result.php?resultid=1449470255 have a look at my valid RPP tasks [I dident get it working on all of them] or whatever it is doing https://boinc.bakerlab.org/rosetta/results.php?userid=139198&offset=0&show_names=0&state=4&appid=9 <stderr_txt> -snip- 2021-11-14 04:18:14 (1236): Create VM. (boinc_abea2fc7e66074d6, slot#8) 2021-11-14 04:18:14 (1236): Setting Memory Size for VM. (6144MB) 2021-11-14 04:18:15 (1236): Setting CPU Count for VM. (5) 2021-11-14 04:18:15 (1236): Setting Chipset Options for VM. 2021-11-14 04:18:15 (1236): Setting Boot Options for VM. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
If there are so many Python tasks, then why can't I get them?One last wild guess- Clean up any left over VMs that are gumming things up. I knew how to remove VM's, maybe I will remove Oracle Vbox (via Revo Uninstaller) and remove Rosetta from the list and clean my drive with CCcleaner and Wise365 and then reinstall Vbox and add Rosetta back to the list. If this fails I have no freaking idea what to do other than complete all my work and unistall BOINC (via Revo) and clean the drive and reinstall it. Then its fresh. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
11/14/2021 3:58:02 PM | Rosetta@home | update requested by user 11/14/2021 3:58:06 PM | Rosetta@home | Sending scheduler request: Requested by user. 11/14/2021 3:58:06 PM | Rosetta@home | Requesting new tasks for CPU 11/14/2021 3:58:08 PM | Rosetta@home | Scheduler request completed: got 0 new tasks 11/14/2021 3:58:08 PM | Rosetta@home | No tasks sent 11/14/2021 3:58:08 PM | Rosetta@home | Project requested delay of 31 seconds Tasks ready to send 5000 rosetta python projects 5000 23096 RAH was removed and added back Vbox was removed, registry cleaned, drive cleaned, rebooted after install. No VM's are active at this time. NO LHC running. No app_config No cc_config limiting RAH Just a GPU restriction to put prime grid on my 1080. Preferences set for "Home" all settings are default. There is NOTHING on my end restricting RAH from doing anything. Yet all it wants is 4.2 So what the %$#&%& is wrong with RAH? |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org