Not getting any python work

Message boards : Number crunching : Not getting any python work

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 23
Credit: 1,170,926
RAC: 0
Message 103346 - Posted: 16 Nov 2021, 14:00:42 UTC

Update: "the workaround" to be able to download the wus pythons, it seems not to work anymore ...
Maybe something on the server might have changed.
In case of positive updates, I will post the results.
Sorry for the inconvenience

Byez
ID: 103346 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mmstick

Send message
Joined: 4 Dec 12
Posts: 8
Credit: 606,792
RAC: 0
Message 103350 - Posted: 16 Nov 2021, 18:24:09 UTC

With as many issues as the Python tasks have; with half of them begin unmanageable, or causing the system's OOM killer to assassinate them for using too much memory; not getting any should be considered a blessing. I've just opted to uninstall virtualbox on my Linux systems. There's simply no valid reason that BOINC projects should be using it on Linux.

We all know that virtualization is largely an inefficient waste of resources. That's especially true for VirtualBox compared to the Linux kernel's KVM/QEMU support. There are better solutions that exist today that would provide the same benefits -- virtual environments, namespaces, and containers -- without having to emulate an entire virtual machine. I'd rather wait for BOINC projects to start using these solutions.

You could argue about Python dependencies, but we live in an era where Python programmers have pip, virtualenv, and anaconda at their disposal. You could bundle your entire development environment into an OSTree or docker image, and execute them natively on a system using a bubblewrap chroot, or podman. Such that the software is being run in an isolated sandbox with no interference from the host OS. Root's not even required to achieve this.

Of course, I'd also argue that Python itself is not the best tool for distributed computing. 100 computers running a Python application will get the same computational output as 1 computer running a Rust application. As far as super simple scripting languages go, I'd give more of a pass to Julia because it at least leverages the most performant mathematics libraries while also performing JIT compilation of its scripts to something that's close to optimized machine code. WASM would also be an excellent target with its ability to compile on any platform architecture and optimize for the system's native CPU.
ID: 103350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103351 - Posted: 16 Nov 2021, 21:21:00 UTC - in response to Message 103342.  

The standard Rosetta tasks also reserve 8gb of ram for each task,
No they don't.
They request & release RAM as required- none is reserved. And apart from a batch of faulty Tasks some time back, the most i have seen used by a single Task was around 4GB. Generally the highest is around 1.3GB.
The current batch of work are using between 700MB & 1GB each.

Please stop making thing up, it's not helpful.



one way around this is a simple app_config file that limits the number of tasks running per project, like this:

<app_config>
<project_max_concurrent>1</project_max_concurrent>
</app_config>
Which sometimes results in Tasks continuously being downloaded without any chance of processing them due to a known bug with how BOINC handles that setting.
Grant
Darwin NT
ID: 103351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103352 - Posted: 16 Nov 2021, 21:23:12 UTC - in response to Message 103346.  

Update: "the workaround" to be able to download the wus pythons, it seems not to work anymore ...
Maybe something on the server might have changed.
Rosetta 4.20 Tasks are now available again. For several days there, they weren't (apart from the very occasional RB Task or a resend).
Grant
Darwin NT
ID: 103352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5550
Credit: 5,554,708
RAC: 47
Message 103355 - Posted: 16 Nov 2021, 22:14:32 UTC - in response to Message 103344.  

Yeah ok...but where in the file with all the other text? top of the pile or what?


I don't think I fully understand the question, however, to recap
In addition to the text file named 'Schduler projects', the content of which will be: Scheduling priority WUs: rah_make_work_rosetta_python_projects

It will need to be integrated with an 'app_config.xml' file with dedicated content, such as:
<app_config>
<app>
<name>rosetta_python_projects</name>
<avg_concurrent>4</avg_concurrent>
<max_concurrent>4</max_concurrent>
</app>
<app_version>
<app_name>rosetta_python_projects_v1.03</app_name>
<plan_class>vbox_64</plan_class>
<avg_ncpus>1</avg_ncpus>
<max_threads>1</max_threads>
<max_mem_usage>14592</max_mem_usage>
</app_version>
</app_config>

This content is just an example! (to 16GB RAM system)


So what you are saying is the file name, that I get.
The scheduling priority WUs: rah_make_work_rosetta_python_projects, goes inside this file along with the code? That's what I am getting at.
ID: 103355 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 23
Credit: 1,170,926
RAC: 0
Message 103356 - Posted: 16 Nov 2021, 22:49:33 UTC - in response to Message 103352.  

Update: "the workaround" to be able to download the wus pythons, it seems not to work anymore ...
Maybe something on the server might have changed.
Rosetta 4.20 Tasks are now available again. For several days there, they weren't (apart from the very occasional RB Task or a resend).
Oh...ok,thanks
ID: 103356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 23
Credit: 1,170,926
RAC: 0
Message 103357 - Posted: 16 Nov 2021, 22:52:54 UTC - in response to Message 103355.  

[quote][quote]Yeah ok...but where in the file with all the other text? top of the pile or what?


?? What exactly are you referring to with: 'all other text'
ID: 103357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5550
Credit: 5,554,708
RAC: 47
Message 103363 - Posted: 17 Nov 2021, 7:25:21 UTC - in response to Message 103357.  

[quote][quote]Yeah ok...but where in the file with all the other text? top of the pile or what?


?? What exactly are you referring to with: 'all other text'


Well..it just wasn't clear to me...but anyway. will make the modification.
ID: 103363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1886
Credit: 6,099,222
RAC: 42
Message 103368 - Posted: 17 Nov 2021, 12:39:08 UTC - in response to Message 103351.  

The standard Rosetta tasks also reserve 8gb of ram for each task,
No they don't.
They request & release RAM as required- none is reserved. And apart from a batch of faulty Tasks some time back, the most i have seen used by a single Task was around 4GB. Generally the highest is around 1.3GB.
The current batch of work are using between 700MB & 1GB each.

Please stop making thing up, it's not helpful.


I didn't 'make it up' it was my misunderstanding that the 4.20 tasks 'request & release ram as required'. I have only ever looked in the properties of a running task and see what it says so was going by that.
ID: 103368 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 23
Credit: 1,170,926
RAC: 0
Message 103373 - Posted: 17 Nov 2021, 17:06:35 UTC
Last modified: 17 Nov 2021, 17:08:05 UTC

Yesterday I was unable to communicate with the server, today it started sending alerts again.
In my experience (always regarding the topic of the original post) the best 'app_config.xml (*) (**)' is:
------------------------------------
<app_config>
<app>
<name>rosetta_python_projects</name>
</app>
<app_version>
<app_name>rosetta_python_projects</app_name>
<plan_class>vbox64</plan_class>
</app_version>
</app_config>
-------------------------------------

The multithreading logically I am not able to activate it and I always find that 2 physical cores of the processor are left free (the logical cores are not used).
I assume (example) that with CPU 8 core FX, max 6 wus are processed simultaneously.
I repeat, with Ryzen 3 3100, max 2 (python) wus, with Ryzen 5 3600, max 4 (python) wus, simultaneously.


(*) Obviously, if you also want to download/modulate
Rosetta 4.2, the file must be suitably integrated with the appropriate strings.
In this case the wus should/could occupy the rest of the CPU's free cores/threads.

(**) After each modification to the 'app_config.xml' file, save and refresh the page and pages back.
Then click on 'Read configuration files', from the 'Options' menu of the BOINC client.
Sometimes, it should be necessary to exit BOINC and also terminate it from 'Task Manager', then restart the program.
ID: 103373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103379 - Posted: 17 Nov 2021, 21:55:00 UTC - in response to Message 103373.  

Yesterday I was unable to communicate with the server, today it started sending alerts again.
The reason it is "sending alerts" is because we are out of Rosetta 4.20 work again. That's all.
If you had been unable to contact the server, you would have got a message stating that.
Grant
Darwin NT
ID: 103379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103380 - Posted: 17 Nov 2021, 22:01:22 UTC - in response to Message 103368.  
Last modified: 17 Nov 2021, 22:11:22 UTC

I didn't 'make it up' it was my misunderstanding that the 4.20 tasks 'request & release ram as required'. I have only ever looked in the properties of a running task and see what it says so was going by that.
If you had been looking at a Rosetta 4.20 Task there is no way you would have come up with it requiring 8GB of RAM.
The memory column on the Process tab in Task Manager shows the amount of memory in use for the listed process/application. The most in use for a Rosetta 4.20 Task i've seen lately has been 3.3GB for a RB Task, All the rest were no more than 1.2GB.
Python Tasks, and only Python Tasks, require 8GB of RAM.


Edit- found a 3.3GB RB Task.
Grant
Darwin NT
ID: 103380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 17 Jan 08
Posts: 12
Credit: 158,189
RAC: 0
Message 103381 - Posted: 17 Nov 2021, 22:48:10 UTC

One problem solved

But it's not a very satisfying solution.

For a while, I've been trying, and failing, to run the new Python VM tasks. I'm an experienced user; I'm actually a BOINC system admin and have modified both the BOINC client and server code when I needed to. Suffice it to say that I know my way around BOINC pretty well.

And yet, for weeks, I've been unable to get the Rosetta server to send Python tasks to my main computer.

It's got plenty of RAM. VBox is installed and virtualization is enabled in the BIOS. VBox apps from other projects run just fine on this computer. But the Rosetta server refused to send Python tasks, no matter what I did. It didn't matter which versions of VBOX or BOINC I used, I could not get tasks. I"ve reset the Rosetta project multiple times. I've detached and attached it multiple times. Nothing worked.

Finally, today, I got it working. But the solution isn't satisfying because it doesn't illuminate what caused the problem. Yes, I fixed it, but I have no idea why it's working now. That's what's so frustrationg.

What I did was to enable multiple BOINC instances in cc_config.xml, and set up a separate BOINC instance on the very same computer. Then I attached to Rosetta using the second BOINC instance. That did the trick. I have no clue why the second instance of BOINC works while the first one doesn't. They are using identical cc_config.xml files. The only differences are the location of the data directory and the RPC port number. Otherwise, it's the same computer and the same software. But the Rosetta server sends tasks to one instance but not the other. I can't explain whjy.

I'm posting this because if you have the same problem, perhaps this may help.

For reference, this is the original BOINC install and this is the second BOINC instance.

Instructions for setting up a second BOINC instance.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 103381 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103382 - Posted: 17 Nov 2021, 22:58:54 UTC - in response to Message 103381.  
Last modified: 17 Nov 2021, 23:17:47 UTC

What I did was to enable multiple BOINC instances in cc_config.xml, and set up a separate BOINC instance on the very same computer. Then I attached to Rosetta using the second BOINC instance. That did the trick.
Ok, that is just plain weird, going all the way to ridiculous.
As Greg_BE the starter of this thread posted, he was getting Python tasks, then he wasn't. Even when the project (as now) ran out of regular Rosetta 4.20 Tasks, it still wouldn't pick up Python Tasks, even though VirtualBox is working for Tasks on another BOINC project.



The only differences are the location of the data directory and the RPC port number.
Firewall configuration issue???
Although how that would stop BOINC from asking for Python work... But then with Rosetta, there is no Python or Rosetta 4.20 work- it's all just Rosetta. There is no way to select one or the other. If your system can do it, you get it. If not, you just get the one you can do.
You request more work from Rosetta and it's the luck of the draw as to which one you get if your system can do both.

At present with no Rosetta 4.20 work, and my BOINC installation not including VirtualBox, i can't do Python work. So each work request just results in a "Vbox is not installed" message. If you've got Vbox, and you need Rosetta work, then you should be getting Python tasks.


Does re-installing BOINC with VBox wipe the previous Vbox installation? Or are any files that are created when Vbox runs, but not part of the installation process, left there? ie Config files, failed VMs etc.
That is the case for projects you are attached to when you re-install BOINC- eg upgrading versions.

Would detaching from Rosetta, using the Add/remove programmes Windows installer to remove VirtualBox then manually making sure the Rosetta project folder & sub folders are all deleted, manually making sure Vbox and all it's sub folders are deleted, then re-install BOINC with Vbox support then re-attach to Rosetta, possibly resolve the issue?
Grant
Darwin NT
ID: 103382 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1858
Credit: 34,296,174
RAC: 2,425
Message 103383 - Posted: 17 Nov 2021, 23:22:31 UTC - in response to Message 103382.  

Does re-installing BOINC with VBox wipe the previous Vbox installation? Or are any files that are created when Vbox runs, but not part of the installation process, left there? ie Config files, failed VMs etc.
That is the case for projects you are attached to when you re-install BOINC- eg upgrading versions.

I tried this on the PC I installed VBox on last week. It doesn't remove Vbox.
However, Greg_BE did send some instructions on how to uninstall Vbox, which I'll attempt to use when I get back down there tomorrow night
ID: 103383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 17 Jan 08
Posts: 12
Credit: 158,189
RAC: 0
Message 103385 - Posted: 17 Nov 2021, 23:29:47 UTC - in response to Message 103382.  
Last modified: 17 Nov 2021, 23:30:42 UTC

Ok, that is just plain weird, going all the way to ridiculous.


My thoughts exactly.

Firewall configuration issue???


It's definitely not the firewall. The RPC port is only used for controlling the BOINC client via the BOINC manager, boinccmd command line interface, or BOINCTasks. It's got nothing to do with commincating with the BOINC project server, Besides, the BOINC client was communicating just fine with the server. The message "I've got no tasks for you!" was getting through loud and clear.

Does re-installing BOINC with VBox wipe the previous Vbox installation?


Yes, if you install the BOINC package containing VBox, it replaces any existing VBOX installation.

Or are any files that are created when Vbox runs, but not part of the installation process, left there? ie Config files, failed VMs etc.
That is the case for projects you are attached to when you re-install BOINC- eg upgrading versions.


I'm not sure if configuration files persist when VBox is reinstalled, either manually or as part of BOINC's installation process.

Would detaching from Rosetta, using the Add/remove programmes Windows installer to remove VirtualBox then manually making sure the Rosetta project folder & sub folders are all deleted, manually making sure Vbox and all it's sub folders are deleted...


Tried that. Multiple times. No joy.

... then re-install BOINC with Vbox support then re-attach to Rosetta, possibly resolve the issue?


I did reinstall BOINC; I even tried using a different BOINC version. No joy there either. I didn't try the exact sequence of steps you suggested, but I don't think it would have made any difference.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 103385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103387 - Posted: 17 Nov 2021, 23:55:18 UTC - in response to Message 103385.  
Last modified: 17 Nov 2021, 23:59:22 UTC

That is just beyond weird.

On other projects you can select the type of work you do when there are multiple types, although on Seti at least the mechanism was broken for the last couple of years.

Selecting one type (Multibeam), and the other (AstroPulse) only if there was none for the first type (Multibeam) (or not at all) worked that way for years without issue. Many would do just AP due to the high Credit payout, but If there was no AP work the systems would pick up MB till the next batch of AP was released- exactly as the settings were configured. Then there was an update to the Scheduler and it stopped working the way it previously did- you had to enable work for both types all the time, in order to reliably get any. Even if your system didn't support one of them.
And from memory it didn't affect everyone, that way, just some systems (which was still a lot due to the number of crunchers there).

The fact that Rosetta doesn't have the option to select the type of work- if your system can't support it, you won't get it- but here we have a case of systems supporting it, capable of running it but still not getting it. It could very well be that old issue coming in to play here. I don't recall if anyone tried your multi-instance workaround (i'm pretty sure they didn't).
Yep- Beyond weird, just ridiculous.
Grant
Darwin NT
ID: 103387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 17 Jan 08
Posts: 12
Credit: 158,189
RAC: 0
Message 103391 - Posted: 18 Nov 2021, 0:33:03 UTC - in response to Message 103387.  

On other projects you can select the type of work you do when there are multiple types, although on Seti at least the mechanism was broken for the last couple of years.


(TO BE CLEAR... the content of this post is referring to BOINC in general, and NOT specifically to Rosetta. I am in no way implying anything about Rosetta or its management. The "don't know what they're doing" part obviously doesn't apply to Rosetta. But it frequently describes new projects that pop up. There's a big learning curve.)


About this... some projects don't support user selection of apps. Sometimes this is intentional. Sometimes it's simply because the admins don't know what they're doing. People don't come out of the womb knowing how to run a BOINC project. In fact, the documentation is pretty poor and a lot of things are done by trial and error.

BOINC has a configuration option to enable user app selection. You change one line in a PHP include file to turn it on or off. It's off by default. I guess the thinking is that most projects start with just a single app, so it would only confuse users to have a selection for just one app.

The problem is that a new admin doesn't know they have to change this setting when they add more apps. Or they don't really understand why their users would even care which apps they run. Unfortunately, many projects don't have this option turned on. It sucks for their users, and many users leave because of that. For some projects which have both an abundance of users and a shortage of work, it doesn't matter. But for projects with sufficient work, this cuts down on the science that gets done. Everyone loses.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 103391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1364
Credit: 13,624,788
RAC: 0
Message 103392 - Posted: 18 Nov 2021, 0:55:22 UTC - in response to Message 103391.  

BOINC has a configuration option to enable user app selection. You change one line in a PHP include file to turn it on or off. It's off by default.
Even so, the Scheduler would still have the code to implement this ability. And as i mentioned, it became noticeably broken on Seti after a Scheduler update a year or two before they shut down.
It's the only thing that comes to mind that might explain the present odd work allocation behaviour affecting a very few systems.

A project that has the multiple application option enabled, the Scheduler has to check for the status of the flags for the different types of application when a host requests more work. But if the option isn't enabled, then how the Scheduler allocates work will have different (or no) default flags, so the behaviour may differ.
And whatever causes the occasional failure of systems to get work with valid applications & application settings may be resulting in the present issue occurring where the feature hasn't been selected, due to the underlying code & default flags & values.
Just a WAG (Wild Arse Guess).



But as to why whatever it is that's causing the issue only happens on so few systems, would probably explain why a multi-instance BOINC installation can have one getting work where the other doesn't. One of things that would probably appear blindingly obvious- once the problem was found and resolved (i certainly had more than my fair share of those repairing electronics over the years).
Grant
Darwin NT
ID: 103392 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1636
Credit: 6,462,241
RAC: 35
Message 103400 - Posted: 18 Nov 2021, 8:39:39 UTC - in response to Message 103391.  
Last modified: 18 Nov 2021, 8:41:30 UTC

The problem is that a new admin doesn't know they have to change this setting when they add more apps. Or they don't really understand why their users would even care which apps they run. Unfortunately, many projects don't have this option turned on.


This is correct for a new project.
R@H is on the "boinc world" since.... i don't remember when it was
I don't want to believe that their admins don't know how to change a simple php/xml file to activate the function.
And yes, documentation is not so large and precise, but there is some pages that can help and you can also partecipate to Boinc newsletter/discussion group if you have configuration problems.
So, for me, it's simply a problem of will.
ID: 103400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Not getting any python work



©2022 University of Washington
https://www.bakerlab.org