Occasional VirtualBox failures

Author	Message
dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 104500 - Posted: 26 Jan 2022, 8:02:04 UTC In addition to the thread for computers that won't run any VirtualBox tasks (which seems to be hardware related somehow), there are regular failures on my machines that are usually happy to run VirtualBox tasks. I haven't looked at the VirtualBox preview for many of them yet, but I have now seen two in a row with this error: Spectre V2 : Spectre mitigation: LFENCE not serializing, switching to generic retpoline https://ibb.co/FDkPMtB If the logs from these are useful then I'll collect and post them - I presume Vbox.log, VboxHardening.log and one of the BOINC logs would be the appropriate ones to post? ID: 104500 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 104644 - Posted: 3 Feb 2022, 8:32:12 UTC For my machines that will usually successfully run VirtualBox tasks, this Spectre V2 error is still the way that most of the ones that stop running fail. Unfortunately they will often run for days like this if they're not spotted. Fortunately I have BOINCTasks running so usually spot them sooner than that for my local machines, but not the remote ones. ID: 104644 · Rating: 0 · rate: / Reply Quote

computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0	Message 104645 - Posted: 3 Feb 2022, 9:00:27 UTC - in response to Message 104644. The Spectre mitigation message appears at every VM start. I don't think it causes the error. Instead its the last info printed on the console before the VM hangs. ID: 104645 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 104646 - Posted: 3 Feb 2022, 15:18:39 UTC - in response to Message 104645. The Spectre mitigation message appears at every VM start. I don't think it causes the error. Instead its the last info printed on the console before the VM hangs. Ok yeah that makes sense. So it's at least narrowed down to any point after that! Would the VBox logs be helpful in diagnosing it, assuming anyone on the project is ineterested? ID: 104646 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0	Message 104647 - Posted: 3 Feb 2022, 16:48:06 UTC If the `Occasional VirtualBox failures` you are talking about are the ones that fail after a few seconds and lock, and use no more cpu time (if it is can you alter the thread title to show this) In studying of my own error rate and looking at wingmen that have completed the task as valid Are the fail at start up work units more often on systems with a large number of cpu/threads when the system is so busy that the application borks itself by not waiting for another instance of rosetta to finish reading from file The output files from these work units often or mostly have line in them like :- 'F:ProgramDataBOINCslots12vm_image.vdi' is locked for reading by another task}, The `slotsnumber` can appear several times in one output file with different `number` in the `slots` as if several instances of rosetta are fighting each other to read {race condition} the file and so crash the work unit From what I have seen of it, is any system with more than 12 cpu/threads [approxametly] more likely to have the startup faults than 4 or 8 core systems A full top down view that only the Admin can get may rubbish this idea in seconds , its the best I have got on it so here it is for you to consider [ and tell me I am talking carp ] Also things like {The object is not ready}, make me think the app is tripping over itself {The object functionality is limited} could be because some of the required components of the `slots` folder have not loaded in time. ID: 104647 · Rating: 0 · rate: / Reply Quote

computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0	Message 104648 - Posted: 3 Feb 2022, 17:18:57 UTC - in response to Message 104647. The slots/x are the working directories for each task. They should be cleared by BOINC when a task ends and (in case of vbox tasks) the vdi image should be deregistered. Your messages point out that there is either a vdi file from a previous task in the slot, e.g. after a crash or a timeout, or that the corresponding entry has not been removed from the VirtualBox medium manager. Both has to be cleaned up manually. - Shut down BOINC - wait until all corresponding processes are closed - delete garbage from the slots; be careful not to remove anything from currently "in progress" tasks - Open the VirtualBox Manager and run the medium manager from the menu - Remove orphaned disk entries; also be careful to ... (same as above) - Restart BOINC My explanation would be that systems under heavy load (lots of concurrently running tasks with heavy I/O) sooner or later run into timeout problems and leave garbage in the slots. ID: 104648 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0	Message 104649 - Posted: 3 Feb 2022, 20:56:22 UTC I did find two dud / zombies in there so will have to keep an eye on that, thanks. ID: 104649 · Rating: 0 · rate: / Reply Quote