Some machines will not run VirtualBox tasks

Message boards : Number crunching : Some machines will not run VirtualBox tasks

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104321 - Posted: 19 Jan 2022, 17:50:31 UTC

Hi everyone

I have a new-to-me PC (Dell Optiplex 3040, Pentium dual core G4500 Skylake gen CPU, 16GB DDR3L RAM, Kingston 120GB SSD) and I've tried Windows 10, Ubuntu 20.04 and Windows 11. It will run other non-VirtualBox projects fine. I haven't tried any VB projects other than Rosetta until now, but it is currently happily running an LHC VB task on both cores at ~75% CPU utilisation so that looks good. It will download and start to run VirtualBox Rosetta tasks, but they never complete and CPU utilisation rarely rises above ~2% according to BOINCTasks which is backed up by task manager when I look at the machine.

Any ideas what might be wrong? The machine seems completely stable. It had the same issues under Ubuntu as it does under Windows. And I have a another machine that behaves exactly the same - that's an old Dell dual CPU server (Poweredge R410) which I've also tried Win10 and Ubuntu on.

I tend to log in via Remote Desktop on this machine, but I didn't when it was running Ubuntu as I hadn't set that up yet.

Windows memory diagnostic hasn't detected any RAM errors, and I've tried resetting the project multiple times. I'm wondering if it might be due to:

* the SSD
* the CPU - temperatures are around ~40°C whilst it's running an LHC VB task.
* RAM - it passes the Windows Memory Diagnostic
* a setting in FW. The Leomoon app says that everything is enabled for virtualisation to work.
* Just very unlucky with the tasks it's getting?
* something else?

I'm running out of ideas! I'm tempted to move a hard drive from a working machine over to it to see if it runs the tasks in the other PC's queue.
ID: 104321 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1491
Credit: 14,677,518
RAC: 14,569
Message 104323 - Posted: 19 Jan 2022, 18:13:41 UTC - in response to Message 104321.  

It will download and start to run VirtualBox Rosetta tasks, but they never complete and CPU utilisation rarely rises above ~2% according to BOINCTasks which is backed up by task manager when I look at the machine.
It's a known issue with some Tasks.
Aborting them seems to be the standard practice.
Grant
Darwin NT
ID: 104323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104326 - Posted: 19 Jan 2022, 21:14:36 UTC - in response to Message 104323.  

I've done that plenty on all of my machines, but this one never gets any VBox tasks that run correctly. It could be down to chance, but seems unlikely at this point. I would guess it's been through 40 tasks, and none have been successful.
ID: 104326 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104331 - Posted: 20 Jan 2022, 9:08:59 UTC

Ok, so it ran an LHC Virtualbox task successfully, and then has gone on to start 2x Rosetta Vbox tasks, both of which were sat at 2% CPU utilisation. I don't think it can just be coincidence at this point.
ID: 104331 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104332 - Posted: 20 Jan 2022, 9:40:31 UTC - in response to Message 104331.  

After a quick view through one of your logs:
https://boinc.bakerlab.org/rosetta/result.php?resultid=1464188756
It looks like the VM starts executing some services, including the internal VBoxGuestAdditions which is required to mount the shared folder on the host.
After that the VM should read task details from the shared folder and set up the task. This is not shown in the log.

What can be wrong:
- the task definition file in the shared folder (delivered by the project)
- the VM can't read the files in the shared folder
- some other weird things causing the VM to crash


How to check the latter:
- Open the VirtualBox GUI and select the VM you want to check.
In case of a crash the small preview might show a hint.
- The vbox log deeper in the slots folder may include some hints
You may discuss that with the VirtualBox forum experts


BTW: The 2nd line indicates a VM restart.
2022-01-19 14:56:54 (8764): Guest Log: 00:00:00.001693 main     5.2.42 r137960 started. Verbose level = 0
2022-01-19 15:40:22 (7404): Detected: vboxwrapper 26202
ID: 104332 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tullio

Send message
Joined: 10 May 20
Posts: 63
Credit: 630,125
RAC: 0
Message 104333 - Posted: 20 Jan 2022, 13:28:03 UTC

Once I could not run some LHC tasks like Theory@home and CMS@home. I had a McAfee antivirus. When I deinstalled it all tasks started running.
Tullio
ID: 104333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104334 - Posted: 20 Jan 2022, 16:28:04 UTC

Thanks computezrmle and tullio. I've tried installing an AV (AVG) and uninstalling it again, updating the Windows drivers, updating the BIOS, resetting Rosetta and reinstalling BOINC.

There is a relevant but not conclusive thread here about the same issue:
https://forums.virtualbox.org/viewtopic.php?f=6&t=104968&p=511914&hilit=boinc#p511914

When I get chance I'm going to take the advice from that thread and install a debian image in a new Virtualbox instance and see how that goes.
ID: 104334 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tullio

Send message
Joined: 10 May 20
Posts: 63
Credit: 630,125
RAC: 0
Message 104337 - Posted: 20 Jan 2022, 17:35:58 UTC
Last modified: 20 Jan 2022, 17:42:50 UTC

I started Rosetta@home on my AMD Ryzen 5 1400 CPU with 24 GB RAM but only 10 GB disk free for BOINC, so in won't run rosetta python. Of 5 Rosetta tasks 3 failed immediately aftar starting, two are running with a D8.15_nme3 string in their name that the other 3 did not have.. On the Intel i5 9400F CPU no task failed so far, either Rosetta or rosetta python.
Tullio
Comment in the 3 failed tasks: STATUS_ACCESS_VIOLATION
ID: 104337 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104338 - Posted: 20 Jan 2022, 17:44:56 UTC - in response to Message 104334.  

Just a test.

In a Rosetta python slots folder locate "vbox_job.xml".
This file contains a softlink to the original "vbox_job.xml" delivered by the project (it has a different name!).

Make a backup of the original file and edit the original file.
The following tags should be modified or added:
<enable_isocontextualization>0</enable_isocontextualization>
<disable_automatic_checkpoints/>
<enable_remotedesktop>1</enable_remotedesktop> (not 100% sure if this works for Rosetta; if not, remove the tag)
<vm_disk_controller_model>IntelAHCI</vm_disk_controller_model>
<vm_disk_controller_type>sata</vm_disk_controller_type>

In case the VM doesn't even boot, remove the disk controller tags.

Then run a fresh task.
ID: 104338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104340 - Posted: 20 Jan 2022, 18:50:27 UTC - in response to Message 104338.  

... the original "vbox_job.xml" delivered by the project (it has a different name!).

Maybe somebody running those pythons could post the content of the xml file.
ID: 104340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104349 - Posted: 20 Jan 2022, 21:23:14 UTC - in response to Message 104340.  
Last modified: 20 Jan 2022, 21:26:20 UTC

... the original "vbox_job.xml" delivered by the project (it has a different name!).

Maybe somebody running those pythons could post the content of the xml file.


I've just checked a few slots and that file is the same in all of them:

<soft_link>../../projects/boinc.bakerlab.org_rosetta/vbox_job_v4.xml</soft_link>

And that file exists in the projects/boinc.bakerlab.org/rosetta folder and contains this:

<vbox_job>
<memory_size_mb>6144</memory_size_mb>
<os_name>Debian_64</os_name>
<enable_shared_directory/>
</vbox_job>

That's the same content on a good and bad machine.
ID: 104349 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104350 - Posted: 20 Jan 2022, 21:53:07 UTC - in response to Message 104338.  

Just a test.

In a Rosetta python slots folder locate "vbox_job.xml".
This file contains a softlink to the original "vbox_job.xml" delivered by the project (it has a different name!).

Make a backup of the original file and edit the original file.
The following tags should be modified or added:
<enable_isocontextualization>0</enable_isocontextualization>
<disable_automatic_checkpoints/>
<enable_remotedesktop>1</enable_remotedesktop> (not 100% sure if this works for Rosetta; if not, remove the tag)
<vm_disk_controller_model>IntelAHCI</vm_disk_controller_model>
<vm_disk_controller_type>sata</vm_disk_controller_type>

In case the VM doesn't even boot, remove the disk controller tags.

Then run a fresh task.


Have tried this - will report back later.
ID: 104350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104356 - Posted: 21 Jan 2022, 9:21:12 UTC - in response to Message 104350.  

The tasks are still sitting idle unfortunately.
ID: 104356 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104358 - Posted: 21 Jan 2022, 10:57:16 UTC - in response to Message 104356.  

I tested the python app on one of my computers.
So far the app does everything to avoid a user to test own settings in vbox_job.xml or use another vboxwrapper.
It would be possible but it's not worth the effort since at the end of the day the project team would have to accept change requests and distribute a new version.
Reading other posts in this forum make me assume the team would ignore this.
Sorry guys, I'm out.
ID: 104358 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104396 - Posted: 22 Jan 2022, 16:36:04 UTC

I've added some more info on the Virtualbox forum thread here:

https://forums.virtualbox.org/viewtopic.php?f=6&t=104968&p=511914&hilit=boinc#p511914
ID: 104396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 104397 - Posted: 22 Jan 2022, 17:06:50 UTC - in response to Message 104358.  

It would be possible but it's not worth the effort since at the end of the day the project team would have to accept change requests and distribute a new version.
Reading other posts in this forum make me assume the team would ignore this.

Exactly so. They are impervious to information.
ID: 104397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
computezrmle

Send message
Joined: 9 Dec 11
Posts: 63
Credit: 9,680,103
RAC: 0
Message 104398 - Posted: 22 Jan 2022, 17:14:46 UTC - in response to Message 104396.  

Access Error:   UUID {ef35dff9-d482-48f8-9519-fef6c1b23a3b} of the medium 'C:ProgramDataBOINCslots3vm_image.vdi' does not match the value {9cb7b2ea-5a86-4be3-9a8c-3017ec7a8c71} stored in the media registry ('C:UsersDanny.VirtualBoxVirtualBox.xml')
Type:           normal (base)
Location:       C:ProgramDataBOINCslots3vm_image.vdi

This means one of the tasks crashed some time in the past and VirtualBox didn't get a chance to clean up the media registry.

Stop BOINC to avoid it would try again to set up a task in slots/3.
Open your VirtualBox GUI and call the media manager.
Remove the disk entry pointing to slots3vm_image.vdi
Then restart BOINC
ID: 104398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104399 - Posted: 22 Jan 2022, 17:21:58 UTC - in response to Message 104398.  

Access Error:   UUID {ef35dff9-d482-48f8-9519-fef6c1b23a3b} of the medium 'C:ProgramDataBOINCslots3vm_image.vdi' does not match the value {9cb7b2ea-5a86-4be3-9a8c-3017ec7a8c71} stored in the media registry ('C:UsersDanny.VirtualBoxVirtualBox.xml')
Type:           normal (base)
Location:       C:ProgramDataBOINCslots3vm_image.vdi

This means one of the tasks crashed some time in the past and VirtualBox didn't get a chance to clean up the media registry.

Stop BOINC to avoid it would try again to set up a task in slots/3.
Open your VirtualBox GUI and call the media manager.
Remove the disk entry pointing to slots3vm_image.vdi
Then restart BOINC


I've done that, but only picked up a batch of Rosetta 4.20 tasks so far. Hopefully some Vbox tasks will come down soon.
ID: 104399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104446 - Posted: 23 Jan 2022, 18:35:53 UTC

Unfortunately the python tasks still dropped to 0% CPU utilisation within the first minute.

I also tried upgrading to the new release of VirtualBox (6.1.32) and grabbed some fresh python tasks but no improvement.

I'll have to leave the affected machines running other projects for now.
ID: 104446 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,902,918
RAC: 61,143
Message 104464 - Posted: 23 Jan 2022, 23:43:08 UTC
Last modified: 23 Jan 2022, 23:50:28 UTC

I have just posted this on the VirtualBox forums (here):

I've confirmed on some more machines - those that work correctly have a path listed under "Settings File Location" and those that don't run the Vbox tasks do not have that entry. So presumably they are failing because they can't see the task they are supposed to run.

Might that be persistent even after installing a new OS because of something like the disk write speed/SATA bus speed being too slow for a time-out? Seems very strange.


Can anyone else confirm this is the case? To view the Vbox instance's Details:
* open VirtualBox manager from the start menu (in Windows at least!)
* In the list on the left, hit the three dots/lines icon to the right of that instance, and choose "Details"

You should see "Settings File Location" listed under "General". If you don't see that, my assumption is that BOINC Vbox tasks will always fail and CPU utilisation will drop to 0% after around 30s on my machines (I suspect that duration is dependant on your disk speed), and if you do, they will generally run ok but with some failures.

Or is it just that the BOINC script isn't getting far enough to set a settings location from within the Vbox?

D
ID: 104464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Some machines will not run VirtualBox tasks



©2024 University of Washington
https://www.bakerlab.org