Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 309 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows. Chances are, they put more effort into the Windows version, since that is what most people use. They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Isn't the code written in Linux or other machine languages and then adapted to windows? They don't have any real tech people anymore. No one to write new code. You know how resistant to change and updated programs they are. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
Linux incompatible with Linux. [rolls eyes in disbelief] I'll stick to Windows....Isn't the code written in Linux or other machine languages and then adapted to windows? |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 196 Credit: 6,613,600 RAC: 5,541 |
Isn't the code written in Linux or other machine languages and then adapted to windows? My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these rosetta_4.20_x86_64-pc-linux I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. So I guess that extra effort they put into the Windows version has not paid off. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot. It is the "0 CPU" ones that fail. As I said, it is with both operating systems. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 401 Credit: 12,294,748 RAC: 5,104 |
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. But he specified that they were 4.20 tasks, not vm. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Watch out on aaam-mVAL_pp-NMPHE-ACBC-AMACBEN2_2_2537474_3 type tasks First guy timed out, I aborted. 7 hours on 4 hours cpu It starts ok: 022-03-25 20:00:56 (16268): Status Report: Elapsed Time: '6000.105049' 2022-03-25 20:00:56 (16268): Status Report: CPU Time: '6012.031250' then: 022-03-25 21:41:41 (16268): Status Report: Elapsed Time: '12000.204857' 2022-03-25 21:41:41 (16268): Status Report: CPU Time: '6789.093750' and 2022-03-25 23:23:03 (16268): Status Report: Elapsed Time: '18000.497971' 2022-03-25 23:23:03 (16268): Status Report: CPU Time: '6859.546875' and 2022-03-26 01:05:12 (16268): Status Report: Elapsed Time: '24001.304815' 2022-03-26 01:05:12 (16268): Status Report: CPU Time: '6925.156250' <aborted> |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
But he specified that they were 4.20 tasks, not vm. OK, I was referring to the VirtualBox (python) ones. They are the only ones that have the "Vm job unmanageable" errors. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
There are no programmers in Rosetta that have a clue.Isn't the code written in Linux or other machine languages and then adapted to windows? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
You don't have to reboot, just restart the Boinc client. And I'm sure there must be a command we can send to the client, using Boinc_cmd that would make it try again. I have asked in the main Boinc forum....I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,653 |
I noticed that the server disabled Python tasks for my computer. After I enabled them, 21 of them completed. All of them validated. Six more are still running. The computer info that looks relevant: 3/27/2022 8:37:31 PM | | Starting BOINC client version 7.16.20 for windows_x86_64 3/27/2022 8:37:31 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10] 3/27/2022 8:37:31 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2 3/27/2022 8:37:31 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19044.00) 3/27/2022 8:37:31 PM | | VirtualBox version: 6.0.14 Maybe this will help decide which computers usually handle which CPU features are enough to handle the Python tasks correctly. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
Thanks. I've made a conclusion so far that it's one or more of these instructions causing the problem, since they're present on all the working ones, but none of the failed ones: avx, avx2, f16c, fma These are quite likely to be at fault, since TN-Grid for example makes good use of avx and fma if present and sends a different program if your CPU doesn't have them. Here's the spreadsheet so far, with the offending instructions in bold. https://www.dropbox.com/scl/fi/8gp41r6sh7ffkqupvglbp/Rosetta-Python-CPU-instruction-set.xlsx?dl=0&rlkey=4ubjc4jqyng1o9ivqyckl8hek Not sure where we go from here. If the program requires an avx capable machine, I doubt Rosetta are willing to make it not need that since it's only missing on older machines. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
VirtualBox comes in two major versions, vbox and vbox64. The Python tasks use only the newer of these, vbox64. Since vbox emulates a 32-bit instruction set and vbox64 emulates a 64-bit instruction set, they are not interchangeable.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,653 |
[snip] If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,653 |
[snip] It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now? I tried contacting Oracle. They made it rather difficult. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 8,091 |
I tried contacting Oracle. They made it rather difficult.Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly? If you want to contact Oracle, there seems to be many ways to do so, here: https://www.virtualbox.org/wiki/Community |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I tried contacting Oracle. They made it rather difficult.Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly? Why should Oracle care about a little problem with a specific program that does not affect thousands or tens of thousands of users of it's product? That is probably why they ran you off. It's like me contacting a cold wear testing lab about a specific product they tested and showed data for only 2 out of 12 zones and neither of these zones are critical to the more important areas that get cold the fastest. I am only a individual contacting a company that tests for million dollar industrial foot companies. My request got round filled or back burnered. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 1,653 |
A failing Python task. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1317287356 The section of the vbox_trace.txt file that looks relevant: 2022-03-28 14:55:36 (14760): Command: VBoxManage -q showvminfo "boinc_35d83054a4475009" --machinereadable Exit Code: -2135228415 Output: VBoxManage.exe: error: Could not find a registered machine named 'boinc_35d83054a4475009' VBoxManage.exe: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee IUnknown VBoxManage.exe: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 2621 of file VBoxManageInfo.cpp 2022-03-28 14:55:36 (14760): Command: VBoxManage -q showhdinfo "C:ProgramDataBOINCslots10/vm_image.vdi" Exit Code: 0 Output: UUID: ef35dff9-d482-48f8-9519-fef6c1b23a3b Parent UUID: base State: created Type: normal (base) Location: C:ProgramDataBOINCslots10vm_image.vdi Storage format: VDI Format variant: dynamic default Capacity: 8192 MBytes Size on disk: 7115 MBytes Encryption: disabled Elapsed time MUCH greater than simulated CPU time. I aborted it. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org