ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.

Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.

To post messages, you must log in.

AuthorMessage
Profile buscher
Avatar

Send message
Joined: 13 Oct 10
Posts: 9
Credit: 4,426,253
RAC: 1,021
Message 105124 - Posted: 22 Feb 2022, 9:31:08 UTC
Last modified: 22 Feb 2022, 9:33:17 UTC

Hello, I am getting lots of

"ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time."
which kind of results in the task being stuck until I restart BOINC. After a restart, the task finishes almost instantly and is successfully validated.

Running on Linux,
Kernel: 5.16.10
BOINC: 7.16.17
VirtualBox: 6.1.31_SPB

Example task: https://boinc.bakerlab.org/rosetta/result.php?resultid=1474366151

What bothers me is that the task is stuck (it needs a BOINC restart/manual action), but also that it seems to have ended just fine.

2022-02-21 23:58:58 (418): Guest Log: 03:14:18.136765 control  Guest control service stopped
2022-02-21 23:58:58 (418): Guest Log: 03:14:18.136931 control  Guest control worker returned with rc=VINF_SUCCESS
2022-02-21 23:58:58 (418): Guest Log: 03:14:18.137311 main     Session 0 is about to close ...
2022-02-21 23:58:58 (418): Guest Log: 03:14:18.137374 main     Stopping all guest processes ...
2022-02-21 23:58:58 (418): Guest Log: 03:14:18.137400 main     Closing all guest files ...
2022-02-21 23:58:58 (418): Guest Log: 03:14:18.166754 main     Ended.
2022-02-21 23:58:58 (418): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.


Is something wrong with my VirtualBox? It only happens for some tasks.
Can I do something to improve this?
ID: 105124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gbayler

Send message
Joined: 10 Apr 20
Posts: 14
Credit: 3,069,484
RAC: 0
Message 105150 - Posted: 23 Feb 2022, 9:27:53 UTC - in response to Message 105124.  

Hi @buscher,
The root cause of this problem seems to be the usage of an outdated/not matching vboxwrapper in the VirtualBox tasks for Rosetta@home. From my understanding, as a user you cannot do anything to fix this problem, but there are some workarounds available (such as restarting BOINC, what you already mentioned).
This issue is #4 on my list of issues with VirtualBox tasks: https://docs.google.com/spreadsheets/d/1lBP27MYx2RH9PYuweMoSwOLvmIaoqI77Q0_gC34e-Z0/edit?usp=sharing
ID: 105150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
xii5ku

Send message
Joined: 29 Nov 16
Posts: 22
Credit: 13,650,724
RAC: 2,076
Message 105151 - Posted: 23 Feb 2022, 9:33:46 UTC - in response to Message 105124.  
Last modified: 23 Feb 2022, 10:26:31 UTC

buscher wrote:
Is something wrong with my VirtualBox?
No. It's most likely a bug in vboxwrapper. It affects *all* vboxwrapper applications (not just Rosetta's) to varying degree.

buscher wrote:
Can I do something to improve this?
Two possible workarounds were mentioned at other places:

  • Try VirtualBox v5 instead of v6.
  • Try replacing the vboxwrapper binary which Rosetta uses with an externally built newer version.

I have not tried either of these two suggestions myself yet, hence don't know if these are real improvements or even real fixes.

Instead, this is my workaround: 1. Set the work buffer depth to ~1.5 days. 2. Restart the boinc-client service every 0.5 days.

Explanations: 1. While one or more tasks are in "postponed" state, the client does not request new work. Hence, the work buffer should be larger than the period at which the client restart is performed. It needs to be considerably larger because the client/server are systematically underestimating the actual durations of rosetta python projects tasks. 2. After client restart, both postponed tasks and previously running tasks will continue to run, proceeding from a very recent checkpoint. The restart can be done automatically because it is almost guaranteed that there will be tasks in "postponed" after half a day, or less in fact.

PS:
The need for such absurd workarounds shows in which sorry state the whole Rosetta@home project has carried itself over the years. It's sad. The 'rosetta python projects' application is just disgusting and should go away. I only am running it myself because I know how to, and because Rosetta v4 work is only intermittently available.

PPS:
Kudos to @gbayler for the issues & workarounds tracker.

ID: 105151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gbayler

Send message
Joined: 10 Apr 20
Posts: 14
Credit: 3,069,484
RAC: 0
Message 105180 - Posted: 23 Feb 2022, 19:35:58 UTC - in response to Message 105151.  

xii5ku wrote:
PPS:
Kudos to @gbayler for the issues & workarounds tracker.

Thank you, it is great to hear that! :) Actually, @dcdc had the idea to collect issues with VirtualBox tasks in his thread Summary of issues with VirtualBox tasks, my sheet is just based on that.

xii5ku wrote:
PS:
The need for such absurd workarounds shows in which sorry state the whole Rosetta@home project has carried itself over the years. It's sad. The 'rosetta python projects' application is just disgusting and should go away. I only am running it myself because I know how to, and because Rosetta v4 work is only intermittently available.

I was quite surprised to learn that R@h does not work properly out of the box, but workarounds such as aborting tasks and restarting BOINC regularly are necessary. I'm torn between being happy to be able to contribute something to the project on the one hand, and doubting whether it is a good investment of my time on the other hand. It feels a little bit like riding the proverbial dead horse!
ID: 105180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 105187 - Posted: 24 Feb 2022, 2:12:18 UTC - in response to Message 105180.  

. It feels a little bit like riding the proverbial dead horse!

It`s zombie dead horse , along with the zombie rosetta tasks ,
ID: 105187 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile buscher
Avatar

Send message
Joined: 13 Oct 10
Posts: 9
Credit: 4,426,253
RAC: 1,021
Message 105229 - Posted: 26 Feb 2022, 11:07:36 UTC

Thanks for the answers.

Since VirtualBox 5 is no longer available (by default) for my distro... I guess I will settle with a cronjob which restarts BOINC every 12h (for now).

And I will opt out of VirtualBox task as soon as WCG is operational again or something major changes on the Rosetta side... :
ID: 105229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 1,996
Message 105252 - Posted: 27 Feb 2022, 0:13:57 UTC - in response to Message 105150.  
Last modified: 27 Feb 2022, 0:14:56 UTC

I just had 2 x your #1. First time ever.
So just shut down boinc and restarted and everything worked fine.

When I ran Quchem I always got something to the effect of Vbox enviorment unstable, restarting later (not the exact phrase but you get the idea). I could never run Quchem steady, so I gave up.

But here I have very little trouble with Vbox jobs. An occasional hang [your #2] (a few in 2 months or so) that I forced to kill. I run AMD Ryzen 7, so I guess I am lucky that I get little or no errors with Python.
ID: 105252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gbayler

Send message
Joined: 10 Apr 20
Posts: 14
Credit: 3,069,484
RAC: 0
Message 105347 - Posted: 6 Mar 2022, 21:26:51 UTC - in response to Message 105187.  

. It feels a little bit like riding the proverbial dead horse!

It`s zombie dead horse , along with the zombie rosetta tasks ,


Hahaha, made my day! 😂 🧟 🐎

Sorry for my late answer, saw only now that you replied.
ID: 105347 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.



©2024 University of Washington
https://www.bakerlab.org