Posts by Michael Goetz

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 103516)
Posted 25 Nov 2021 by Profile Michael Goetz
Post:
I have an EPYC 7742 with 128 gig ram. When I added virtualbox (now required I guess) now I only get 16 tasks, the machine is almost idle. What needs to be done to fix this ?

Running cinnamon mint 19 (ubuntu Linux)


There *IS* a way to make this work, but it's complicated. Two ways, actually.

As others have said, the tasks are set up such that BOINC must provide 7.5 GB of memory for each task. They don't actually need that much, and don't actually use that much. If, for example, you recompiled BOINC and turned off that restriction (which is something you CAN do), the tasks would all run just fine.

If you're not a programmer and don't want to change the BOINC software, what you can do is Google the instructions for running multiple BOINC instances on the same computer. Instead of running the normal single BOINC instance, run 8 instances. Each instance will think it's running on a computer with 128 GB. Each of the 8 instances will therefore be able to run 16 tasks. Set the memory limits in the preferences to 100%.

It's a nuisance to do. You have to change cc_config.xml to enable multiple BOINC clients. You have to explicitly start each BOINC instance on a different port number. You have to control each instance individually, again using that unique port number. Finally, if you use WUProp, you need to tell it about the port number with app_config.xml. You should probably also set each instance to use 12.5% of the CPU in case regular tasks are available. (6.125% if the CPU is hyperthreaded and has 256 threads).

EDIT: It would, of course, just be much easier if the project lowered the memory limit. It's just a single setting in the work generator and won't affect how the tasks actually run. I'm not privy to the thought process of the admins, nor am I aware of all the facts, but I'd guess the odds are 50/50 that they lower the memory requirements to something more reasonable at some point. 500 MB seems reasonable to me, based on what I've seen.
2) Message boards : Number crunching : Not getting any python work (Message 103515)
Posted 25 Nov 2021 by Profile Michael Goetz
Post:
ADMIN, I would love to come back, but your slash and burn edit to the python scheduler black listed my system which is apparently why I had to start this thread in the first place. Because of that code, you limited me to 4.2. Now that you can not generate any 4.2 and blacklisted my system in the early stages of this Python adventure, there is no work for me, which is why I left.

I gave you guys 15 years and you block me without saying anything until way later with what? 2 or 3 lines?
That's not a good thing.

I have gotten Vbox to work just fine over on QuChemPed and I have run LHC Atlas for a long time without problems. So I don't think it was really my system that was at fault, rather I think it was errors on your early Python tasks that caused the error you are blocking.

If you don't want me back then I might as well delete any association with this project and say nice knowing you.


It looks like we can turn that "blacklist" flag on and off ourselves now.



How?
My understanding of his post was that he put a code in the scheduler to eliminate systems that generated a certain kind of error. If it is in the scheduler, then how do you get around that? Once it knows you IP and account, I don't think you can undo that.


Go to the details page for that host. Scroll to the bottom. You'll see, on the last line, a setting "VirtualBox VM jobs", and a blue and red button labelled "Allow". Click on that button. That's it. That turns off the blacklist from the scheduler.

The button will change to "Skip". You can click on the "skip" button if you ever want to turn off the Vbox tasks.



I don't see anything in BOINC or on my preferences page to match what you are talking about.
I think that option is only for people that are not black listed.


I did not say anything about the preferences page. On any other project, that's where the control would be. But here the change is on the host detail page. Go the your list of computers and select the "details" link for the computer you want to change. There's a different setting for each computer.

Also, I believe that if you don't have vbox installed the server doesn't show that button. The control appears to only be there for computers that have vbox installed.

You definitely CAN turn off the blacklisting. My computer was blacklisted, and now it's not.

The correct page for changing the control for your c omputer is https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=3433065 There should be a gaudy blue and red skip/allow button near the bottom of the page beneath the "update location" button.
3) Message boards : Number crunching : Not getting any python work (Message 103501)
Posted 24 Nov 2021 by Profile Michael Goetz
Post:
ADMIN, I would love to come back, but your slash and burn edit to the python scheduler black listed my system which is apparently why I had to start this thread in the first place. Because of that code, you limited me to 4.2. Now that you can not generate any 4.2 and blacklisted my system in the early stages of this Python adventure, there is no work for me, which is why I left.

I gave you guys 15 years and you block me without saying anything until way later with what? 2 or 3 lines?
That's not a good thing.

I have gotten Vbox to work just fine over on QuChemPed and I have run LHC Atlas for a long time without problems. So I don't think it was really my system that was at fault, rather I think it was errors on your early Python tasks that caused the error you are blocking.

If you don't want me back then I might as well delete any association with this project and say nice knowing you.


It looks like we can turn that "blacklist" flag on and off ourselves now.



How?
My understanding of his post was that he put a code in the scheduler to eliminate systems that generated a certain kind of error. If it is in the scheduler, then how do you get around that? Once it knows you IP and account, I don't think you can undo that.


Go to the details page for that host. Scroll to the bottom. You'll see, on the last line, a setting "VirtualBox VM jobs", and a blue and red button labelled "Allow". Click on that button. That's it. That turns off the blacklist from the scheduler.

The button will change to "Skip". You can click on the "skip" button if you ever want to turn off the Vbox tasks.
4) Message boards : Number crunching : Not getting any python work (Message 103499)
Posted 24 Nov 2021 by Profile Michael Goetz
Post:
ADMIN, I would love to come back, but your slash and burn edit to the python scheduler black listed my system which is apparently why I had to start this thread in the first place. Because of that code, you limited me to 4.2. Now that you can not generate any 4.2 and blacklisted my system in the early stages of this Python adventure, there is no work for me, which is why I left.

I gave you guys 15 years and you block me without saying anything until way later with what? 2 or 3 lines?
That's not a good thing.

I have gotten Vbox to work just fine over on QuChemPed and I have run LHC Atlas for a long time without problems. So I don't think it was really my system that was at fault, rather I think it was errors on your early Python tasks that caused the error you are blocking.

If you don't want me back then I might as well delete any association with this project and say nice knowing you.


It looks like we can turn that "blacklist" flag on and off ourselves now.
5) Message boards : Number crunching : Not getting any python work (Message 103423)
Posted 18 Nov 2021 by Profile Michael Goetz
Post:
After actually going to write that code, it turns out to be even simpler. Or more complex. It depends on the age of specific files on the webserver.

If you have BOINC web server code from the last 4 years or so, you don't have to write any code at all. Near the beginning of html/project/project.inc is a single definition:

define('APP_SELECT_PREFS', false);
    // user can choose which apps to run


Change "false" to "true" and the project selection should appear on the webpage.

That's it.

If, however, the webserver code is older, and doesn't use that define, then the file html/project/project_specific_prefs.inc needs to be customized to add the selections. It's really easy to do, as I mentioned, but I can't do it blind without seeing the other include files on the server. They've changed over the years and the functions need to be written differently depending on what you already have.
6) Message boards : Number crunching : Not getting any python work (Message 103420)
Posted 18 Nov 2021 by Profile Michael Goetz
Post:
You can't select it in BOINC (at least not that I have ever seen), however....the project website can allow you to choose which project you want (LHC@Home, PrimeGrid, WCG, etc.) RAH chose not to do this. How hard can it be to write some code to interact between the web and your project server? Apparently very hard.


In fact, it's trivially easy. The scheduler supports app selection out of the box, and the code I referenced above is the template for adding app selection to the web server's project preferences page. The interaction between the two is automatic. It's written into the scheduler code.

If you're only selecting apps, (and not the more complex stuff like GPU classes, multithreading, and so forth) this is something that anyone who knows PHP could set up in an hour or two. Much less if they've done it before. But longer if you want to, you know, test it :P

Give me the app_id numbers of the Rosetta, Rosetta Mini, and rosetta python project apps (they're probably 1, 2, and something higher than3), and I can give you a drop in, customized project_specific_prefs.inc file that would give you working app selection here.

By tomorrow.

It's that simple.

This is a serious offer. I even promise to write the code *after* I have my morning coffee. There's fewer bugs when I'm caffeinated.

(Okay, there IS a possibility that Rosetta's scheduler code hasn't been updated in 15 years, and might possibly be too old to support app selection. I'm hypothesizing, however. The oldest scheduler code I've ever looked at supported app selection, but Rosetta's older than that. So it's possible it would need an updated scheduler. Otherwise, it's trivial. And certainly it will work for any project created in the last 10 or so years.)
7) Message boards : Number crunching : Not getting any python work (Message 103415)
Posted 18 Nov 2021 by Profile Michael Goetz
Post:
The scheduler was modified to flag hosts that produce specific VM box errors in the stderr output:

"VM failed to start"
"execv: Permission denied"
"ERROR: VBoxManage list hostinfo failed"

If flagged, the host will no longer get VM jobs.

This was a quick but dirty fix to prevent sending VM jobs to such hosts.


So the only way to fix such a host would be to detach from RAH, wait until all the tasks for that host have been purged from the database, delete the host, and then reattach, thus creating a new host? Or do the the dual-installation hack.

Anyway, thanks for this information. At least some of us can stop trying now.
8) Message boards : Number crunching : Not getting any python work (Message 103412)
Posted 18 Nov 2021 by Profile Michael Goetz
Post:
According to Jord's post in this thread, being able to choose sub projects is not part of the standard BOINC code.

https://boinc.berkeley.edu/forum_thread.php?id=14462&postid=106111#106111


With all due respect that Jord is *normally* due, this time he's simply plain wrong.

The app selection is different at each project (because they have different apps), so you do have to write a small amount of code to fill in the details, but the framework and the prototype code for app selection is in the file html/project.sample/project_specific_prefs.inc. It's here in the git repository: https://github.com/BOINC/boinc/blob/master/html/project.sample/project_specific_prefs.inc

Jord should have known better. He was looking in the wrong place.
9) Message boards : Number crunching : Not getting any python work (Message 103391)
Posted 18 Nov 2021 by Profile Michael Goetz
Post:
On other projects you can select the type of work you do when there are multiple types, although on Seti at least the mechanism was broken for the last couple of years.


(TO BE CLEAR... the content of this post is referring to BOINC in general, and NOT specifically to Rosetta. I am in no way implying anything about Rosetta or its management. The "don't know what they're doing" part obviously doesn't apply to Rosetta. But it frequently describes new projects that pop up. There's a big learning curve.)


About this... some projects don't support user selection of apps. Sometimes this is intentional. Sometimes it's simply because the admins don't know what they're doing. People don't come out of the womb knowing how to run a BOINC project. In fact, the documentation is pretty poor and a lot of things are done by trial and error.

BOINC has a configuration option to enable user app selection. You change one line in a PHP include file to turn it on or off. It's off by default. I guess the thinking is that most projects start with just a single app, so it would only confuse users to have a selection for just one app.

The problem is that a new admin doesn't know they have to change this setting when they add more apps. Or they don't really understand why their users would even care which apps they run. Unfortunately, many projects don't have this option turned on. It sucks for their users, and many users leave because of that. For some projects which have both an abundance of users and a shortage of work, it doesn't matter. But for projects with sufficient work, this cuts down on the science that gets done. Everyone loses.
10) Message boards : Number crunching : Not getting any python work (Message 103385)
Posted 17 Nov 2021 by Profile Michael Goetz
Post:
Ok, that is just plain weird, going all the way to ridiculous.


My thoughts exactly.

Firewall configuration issue???


It's definitely not the firewall. The RPC port is only used for controlling the BOINC client via the BOINC manager, boinccmd command line interface, or BOINCTasks. It's got nothing to do with commincating with the BOINC project server, Besides, the BOINC client was communicating just fine with the server. The message "I've got no tasks for you!" was getting through loud and clear.

Does re-installing BOINC with VBox wipe the previous Vbox installation?


Yes, if you install the BOINC package containing VBox, it replaces any existing VBOX installation.

Or are any files that are created when Vbox runs, but not part of the installation process, left there? ie Config files, failed VMs etc.
That is the case for projects you are attached to when you re-install BOINC- eg upgrading versions.


I'm not sure if configuration files persist when VBox is reinstalled, either manually or as part of BOINC's installation process.

Would detaching from Rosetta, using the Add/remove programmes Windows installer to remove VirtualBox then manually making sure the Rosetta project folder & sub folders are all deleted, manually making sure Vbox and all it's sub folders are deleted...


Tried that. Multiple times. No joy.

... then re-install BOINC with Vbox support then re-attach to Rosetta, possibly resolve the issue?


I did reinstall BOINC; I even tried using a different BOINC version. No joy there either. I didn't try the exact sequence of steps you suggested, but I don't think it would have made any difference.
11) Message boards : Number crunching : Not getting any python work (Message 103381)
Posted 17 Nov 2021 by Profile Michael Goetz
Post:
One problem solved

But it's not a very satisfying solution.

For a while, I've been trying, and failing, to run the new Python VM tasks. I'm an experienced user; I'm actually a BOINC system admin and have modified both the BOINC client and server code when I needed to. Suffice it to say that I know my way around BOINC pretty well.

And yet, for weeks, I've been unable to get the Rosetta server to send Python tasks to my main computer.

It's got plenty of RAM. VBox is installed and virtualization is enabled in the BIOS. VBox apps from other projects run just fine on this computer. But the Rosetta server refused to send Python tasks, no matter what I did. It didn't matter which versions of VBOX or BOINC I used, I could not get tasks. I"ve reset the Rosetta project multiple times. I've detached and attached it multiple times. Nothing worked.

Finally, today, I got it working. But the solution isn't satisfying because it doesn't illuminate what caused the problem. Yes, I fixed it, but I have no idea why it's working now. That's what's so frustrationg.

What I did was to enable multiple BOINC instances in cc_config.xml, and set up a separate BOINC instance on the very same computer. Then I attached to Rosetta using the second BOINC instance. That did the trick. I have no clue why the second instance of BOINC works while the first one doesn't. They are using identical cc_config.xml files. The only differences are the location of the data directory and the RPC port number. Otherwise, it's the same computer and the same software. But the Rosetta server sends tasks to one instance but not the other. I can't explain whjy.

I'm posting this because if you have the same problem, perhaps this may help.

For reference, this is the original BOINC install and this is the second BOINC instance.

Instructions for setting up a second BOINC instance.
12) Message boards : Number crunching : No android tasks? (Message 90240)
Posted 21 Jan 2019 by Profile Michael Goetz
Post:
Rosetta	5296	258780	6.19 (0.34 - 93.77)	10910
Rosetta Mini	9490	121991	6.08 (0.34 - 108.38)	8049
Rosetta for Android	0	0	---	0


I see there's no android tasks available at the moment. Is this a temporary or permanent situation?






©2024 University of Washington
https://www.bakerlab.org