Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 182 · 183 · 184 · 185 · 186 · 187 · 188 . . . 309 · Next
Author | Message |
---|---|
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
Workunit download size is not very big because all rosetta workunits use persistnt database which is downloaded with first workunit and reused for other workunits. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,526,036 RAC: 10,392 |
.Can you try to change use at most memory setting in computing preferences > disk and memory? One thing just discovered - if you specify a maximum amount of disk space for Boinc to use, like 850Gb in your case, Python tasks seem to use more than if you allow for no restriction. It seems to be a bug/quirk in how Boinc works with Python tasks. See if it makes a difference |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Use at most setting for memory is set to 99% and 100% for the CPUs. Server has 128 threads and 256GB of memory yet only 17 tasks are running. No message in log indicating that BOINC is waiting for any resource. Boinc has copied the VDI file to 88 slots result in about 697GB of used disk space. Disk is a 900GB disk. Boinc told to leave 1% free as the most restrictive parameter. Memory and disk settings look good , I can run 43 tasks in 128GB of ram [cpu set for 90% use] Have you checked for zombie tasks , that is tasks that have hardly any `cpu time` and lots of `elapsed time` I find they will stop other tasks running It takes a while to go through the `Running tasks` everyday and abort the zombies [if you get them] |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
result in about 697GB of used disk space. Disk is a 900GB disk. Boinc told to leave 1% free as the most restrictive parameter. One thing just discovered - if you specify a maximum amount of disk space for Boinc to use, like 850Gb in your case, Python tasks seem to use more than if you allow for no restriction. It seems to be a bug/quirk in how Boinc works with Python tasks. see if it makes a difference[/quote] You can safely set the disk space bigger than the disk , so set "use no more than" to 1000GB and boinc will only use what it needs. I found a combination that works for me , so try this from my thread on the problem. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14903&postid=104879 Use no more than - 1000 GB . . don't worry about setting this BIG. Leave at least ## GB free . . [untick this box not needed] Use no more than ## % of total . . [untick this box not needed] try it , see what happens , pythons are a pain. |
entity Send message Joined: 8 May 18 Posts: 19 Credit: 6,122,942 RAC: 5,437 |
I was finally able to get about 54 of the tasks running before it started to allocate into the swap space. I just stopped there to prevent any additional IO due to memory. I may have to cut back a bit as I'm starting to get the "VM Job unmanageable -- restarting later" message. I may cut back to around 32 tasks (256GB / 8GB). |
BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0 |
One of your Vbox tasks shows this error in the log: Failed to create the VirtualBox object! I seem to have gotten VirtualBox working by doing a few things (so unsure exactly what fixed the VBoxManage.exe errors). First, I let the last batch of "Rosetta v4.20 windows_x86_64" tasks to complete. Had dozens of "movingstub_..." tasks all fail immediately, and would cause the BOINC GUI to hang without updates while those movingstub tasks would start then fail. So stopped getting new tasks, and then allowed the good tasks to complete. (Went to work and left the system alone.) Second, reenabled the "VirtualBox VM jobs" by clicking the "Allow" button. New tasks for "rosetta python projects v1.03 (vbox64) windows_x86_64" started to download. These tasks also started failing with more of the failed to create VirtualBox object. Shutdown BOINC so the few remaining tasks would not get a chance to start running. Third, downloaded and installed the latest BOINC version. This would update my existing from BOINC_7.16.11_with_Virtualbox_6.1.12 to the next version of BOINC_7.16.20_with_Virtualbox_6.1.12 Note that the VirtualBox version remains the same. But it is still older than the current available version that VirtualBox says is available when its GUI is started up manually. Fourth, when running the update executable, I made sure to right click on the installer program and use the "Run as administrator" option. Previously I have the BOINC program files installed on SSD Drive C: and the ProgramData files on HDD D: For this install, I also put the Program Files onto D: as well. I also unchecked the option to to run BOINC in a separate service account, as I think I saw other comments saying that using a service account could be a problem. Fifth, after the update BOINC seems to start, but not sure if the Rosetta tasks were working. After a minute or so, I shut down BOINC, then rebooted the PC since it has not been required during the upgrade. I figured I should give the system a clean restart anyhow. Sixth, when BOINC started this time, I manually started the Oracle VM VirtualBox Manager GUI (again, ignoring the message that a newer version is available). After awhile I started seeing VMs getting spooled up one at a time. I had not seen these background VMs appearing before, and the errors make sense if the imagers weren't getting started in the first place. Until I saw the individual VM images listed in the VirtualBox GUI, I was not sure what I would be seeing anything in the leftmost column under the Tools icon. These images were starting up pretty slowly, one at a time, with the HDD activity was maxed out at 100% during each start, and only a brief back off of HDD activity between images starting. So VM spool-up speed seems up HDD bandwidth limited on this system. Eventually 8 images were started, and the HDD activity thankfully dropped greatly. To start off BOINC allowed for 100% CPU activity, which I reduced it to 50% CPU usage. All 8 images kept running, which makes sense, as this machine has an 8 core, 16 thread processor, so 1 VM per core. I then backed off the BOINC CPU usage to 45% so only 7 core would be in use to reduce fan noise slightly and allow other computer activities to have one full core to play with while the VMs ran in the background. (50% / 8 cores = ~6.25% per core, so the 45% limits will only support 7 cores at ~43.75% total.) So the reinstall made these changes: - Ran the combined BOINC and VirtualBox installer with Administrator Privileges (not sure if this was done previously). - The BOINC Version was updated. - The VirtualBox appeared to have some/all updates run on it as well, including changed to system privileges. - The BOINC Program Files were placed in a new location. - The removal of BOINC running in a separate service account. As BIOS VM support was already turned on, and Microsoft VM support in Windows 10 (Hyper-V ?) was previously removed, the reinstall didn't mess with these other system issues. With the New Rosetta Python now running, and using the Suspend/Resume button to verify switching between Rosetta and non-Rosetta tasks all run as expected, I just need to wait to see these Python VM Tasks complete and report successfully. But I expect that everything is now running OK. Finally, on my non-Windows machines the "movingstub_..." tasks were not failing immediately. Some seemed to have completed. Didn't look into it further, but it seems the problem with all the movingstub tasks is related to Linux vs Windows platform differences. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2002 Credit: 9,782,117 RAC: 4,970 |
VERY efficient. It's up to the volunteers to kill the bad tasks. VERY efficient. I give my time and resource to this project before you, it's not a problem. But i think that a little bit of respect for volunteers is welcome. It's pure pragmatism. If it wasn't for the bad news, there wouldn't be any news at all. Are you inviting me to abandon the project? I'm italian, so my english is not so good... |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
When i downgrade from latest virtualbox version to 5.2.44 instead of unmanageable errors system begins lagging and beeping https://www.youtube.com/watch?v=wQp1WFeWZCY |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
VERY efficient. It's up to the volunteers to kill the bad tasks. VERY efficient. I've been here a long time. This used to be a project that was progressive in their keeping tabs on things here. But as they have grown more dependant on their neural network, they stopped paying attention here. The old procedure was to test new tasks on the beta program RALPH. But now it seems to be just put the tasks here without checking them and see what happens. I wouldn't give up this project, because the science we process for them is useful. It's just a shame they don't communicate anymore. |
BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0 |
Rosetta@home: Notice from server Now that I just got VirtualBox working, I got a similar server message. Went digging into it, and made the following notes: - - - Windows 11 file manager stats: 1 TB HDD 264 GB used 667 GB free - - - Within BOINC: Computing preferences for the Disk [x] Use no more than 100 GB [x] Leave at least 1 GB free [x] Used no more than 50% of total Showing in the tab "Notices" Rosetta@home: Notice from server rosetta python projects needs 4874.94MB more disk space. You currently have 14198.55 MB available and it needs 19073.49 MB. 2/20/2022 4:36:59 PM Showing in the tab "Disk" Project #1: 530.50 KB Project #2: 2.18 GB Project #3: 21.49 MB Rosetta@home: 83.74 GB So, Rosetta is using about 95% of the available disk space allocated to BOINC, with a BOINC total of about 86 GB used of the 100 GB limit. - - - Re-writing the server notice to be more readable: rosetta python projects needs 4.87 GB more disk space. You currently have 14.19 GB available and it needs 19.07 GB. Not clear where these limits and free space values are coming from. Nonetheless I can easily increase the available HDD space for BOINC usage. Currently still have 667 GB of free space, and no plans to use any of it for now. Changing the Disk settings to only have one limit for at lease 50 GB of free space be left available. [ ] Use no more than ____ GB [x] Leave at least 50 GB free [ ] Used no more than ____ of total With the old settings, the available HDD limit should have been 100 GB. With the new settings, the available HDD limit should be ~617 GB, assuming nothing else uses any of the 667 GB of free space currently available. UPDATE: By listing these different settings and notices in one place, I see the 86 GB used of 100 GB available leaves only 14 GB free. So the notice about 14.19 GB free but needs 19.07 GB for python (VM based) projects now makes sense. With the increased available disk space change, I now need to wait and see if the server recognizes more space is available, and starts sending out more and/or larger tasks. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
@ BoredEEdude Try this I found a combination that works for me , so try this from my thread on the problem. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14903&postid=104879 Use no more than - 500 GB . . don't worry about setting this BIG. even if you set this bigger than the disk , it works. Leave at least ## GB free . . [untick this box not needed] Use no more than ## % of total . . [untick this box not needed] try it , see what happens , pythons are a pain. I tried every setting possible [sort of] until I found this simple setup other settings do sort of work , but boinc does some wakjob cookie maths to get there . More like mashamatiks don't let it do your head in |
BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0 |
I've been here a long time. This used to be a project that was progressive in their keeping tabs on things here. But as they have grown more dependant on their neural network, they stopped paying attention here. I have also been supporting Rosetta@home for a long time (will be 10 years as of 2022-04-11), but have been thinking lately (without any input except what the program and website statistics indicate) that something about the project's culture seems to have changed. In the early days, they really seemed to appreciate the huge amount of "free" computational power being made available. Stories of what kinds of results they were getting could be found easily, and it was clear that a lot of the progress being made was because of the computing resources being volunteered. 100% of my many computers were dedicated to processing Rosetta tasks for most of my time here. When COVID hit, the amount of available resources skyrocketed to "help find a cure" or at least better solutions to the problem. My individual climb up the crunching statistics ladder slowed, stopped, then backslid as all the other resources came online. It was all good, since getting science done was the main thing. Personal credit standing was just a nice-to-see thing, and I knew I would eventually level out at some point below the biggest crunchers. It was also interesting over the years to watch power crunchers with massive server farms rocket up the ladder, then eventually disappear, and finally see their peak ranking backslide down the ladder as new power users appeared with better computers, or how the slow-but-steady crunchers would just continue their slow rise on the rankings ladder. When GridCoin came along, I also figured what the heck, I may as well try getting some of that crypto for the computer work I'm already doing anyhow. GRC will probably never amount to anything, but one thing it did do was prod me to add on other projects when Rosetta was having an outage and my computers were sitting idle. So it served to open the crack of my previous exclusive support to Rosetta, as other projects got CPU time whenever Rosetta went down. Now I'm reading about their changing focus (what's this neural network stuff about?), seeing a lot of problematic tasks getting released for days on end with no supervision, and having many of my computers sitting idle from lack of compatible work. Reading that a computational-based project has no dedicated IT support staff (not even 1 part-time individual?), science researchers writing programs (with possibly weak basic computer skills), and expecting the individuals already volunteering their computer resources to be the ones troubleshooting issues back to a project that may not really be listening to that feedback is slightly disturbing. After getting millions of dollars worth of free computing resources donated, which in turn undoubtedly helped to validate research directions and define individual researcher's careers along the way, I would expect that some small but significant about of support/effort be put into addressing at least the obvious problems encountered by the volunteers to the project. That no one from the project will apparently ever read this comment for themselves would indicate a level of assumed entitlement to those donated resources. Or possibly an expectation of moving away from needing those resources in the future, so why bother supporting them now? Just as SETI@home eventually stopped supplying computational work on March 2020, maybe Rosetta@home is heading in a similar direction at some point? I will continue to give Rosetta priority over other BOINC projects (for now). But if this project expects to retain this computing capability (from "everyone", not just me), they should start paying more attention to supporting it. Once users start leaving due to no available work, or after a lot of aggravation from bad/useless work (I'm looking at you movingstubb), those users may never return. And I hope we won't see another COVID-type crisis to spike the arrival of new computational donors, so they should be managing the ones they still have more carefully. |
BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0 |
@ BoredEEdude I changed to using just the "Leave at least _____ GB free" setting about 30 minutes before reading your post about using just the "Use no more than _____ % of total" setting. As I write this, the BOINC server notice about needing more space has gone away, so I'm assuming my change is working for now. If I get a new server error about lacking space, I will give your approach a try. I also just saw my first 3 valid python tasks accepted by the server a few minutes ago, so my main problem of VirtualBox tasks not working seems to be fixed. For now I just want to see if everything keeps running smoothly for the next few days. If I keep tinkering with the settings it might just confuse the server further in the short term. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I'm soread out over a bunch of projects. WCG, SiDock, Quchem (though it never runs stable for me) for health projects. I do a few science and math projects as well. And Folding at home outside of BOINC. If you look at Python tasks, they are running Aimnet stuff which isn't there work. Atoms in Molecules. It's quite complex. But yes your right, Dr. B dropped off years ago, pre covid even. Moderators disappeared. Used to have grad students tell us what's going on, but that's gone. The task creators used to monitor here for bugs, but not any more. But despite this, people join and stay. |
BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0 |
@ BoredEEdude Well, by the next day the BOINC client was only running 3-4 python tasks with about 12 more waiting to run, even though the CPU was almost down to an idle and there was plenty of free memory. Those 15 tasks were also the only tasks in the queue, and requesting and update didn't get more tasks. The same server side error about low disk space had also showed up again. The day before the new settings arriving had gotten up to about 218 GB in use, then it fell back down overnight to around 70 GB as tasks were completed. So I unchecked the "Leave at least 50 GB free" setting, and used .clair.'s suggestion of just "Use no more than 500 GB" instead. Since the client had not started up and existing tasks into available free memory and CPUs, I then restarted BOINC manager. After restart, about 20 new tasks were downloaded immediately, and in short order 10 python tasks were up and running at the same time. Now to wait and see if this new setting is more stable for me over time. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,526,036 RAC: 10,392 |
It's pure pragmatism. If it wasn't for the bad news, there wouldn't be any news at all. Even in English, it's an obscure phrase based on a 25yo TV programme. Basically, it means that if you took offence at a comment that was simply a statement of fact, and need an apology and thanks, here it is. But because it was never the point, I'm not going to get bogged down by worrying about it. Rosetta has always been an experimental project imo. Asking questions that have never been asked before, using tasks that have never been written before, with parameters whose limits may not be entirely obvious from the outset. So if things go wrong, it should hardly be a surprise to anyone and no-one should get themselves worked up about it, especially when failures are a bigger problem for the project than they are for any one of us. And that's the case here. How they chose to solve the problem is down to them, not us. Because they <can't> solve it and only users can in this instance. Same as it ever was. And the longer someone has been here, the more apparent it should be after all this time. I guess, for some, the penny never drops. So here we are. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,526,036 RAC: 10,392 |
@ BoredEEdude Your Windows says you have 667Gb free. Your Boinc says [x] Use no more than 100 GB [x] Leave at least 1 GB free [x] Used no more than 50% of total Those settings don't make much sense together, so deselect 3 and increase 1 to 300Gb (increase 2 for safety if you like) On your next update, I expect you to get a lot more tasks immediately. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
It's not a server error, it's BOINC manager telling you that the project on your computer thinks it needs more disk space. My solution was to uncheck everything but the GB free box and set that at 2 GB. I still got a one time error about disk space which was nonsense when you have over 360 GB free on a dedicated drive. Since that one time error I have never had that problem. I was running 15 pythons at the time. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
It's not a server error, it's BOINC manager telling you that the project on your computer thinks it needs more disk space. My solution was to uncheck everything but the GB free box and set that at 2 GB. I still got a one time error about disk space which was nonsense when you have over 360 GB free on a dedicated drive. Since that one time error I have never had that problem. I was running 15 pythons at the time. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
It's pure pragmatism. If it wasn't for the bad news, there wouldn't be any news at all. This is cutting edge science. But...they usually use Ralph first to test their ideas. This time they didn't. Such is life at the 'new' RAH. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org