Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105695 - Posted: 25 Mar 2022, 16:11:00 UTC - in response to Message 105690.  

Isn't the code written in Linux or other machine languages and then adapted to windows?
ID: 105695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105696 - Posted: 25 Mar 2022, 16:40:26 UTC - in response to Message 105695.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.
ID: 105696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105698 - Posted: 25 Mar 2022, 18:20:18 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


They don't have any real tech people anymore. No one to write new code. You know how resistant to change and updated programs they are.
ID: 105698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105700 - Posted: 25 Mar 2022, 19:13:37 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.
Linux incompatible with Linux. [rolls eyes in disbelief] I'll stick to Windows....
ID: 105700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 196
Credit: 6,613,600
RAC: 5,541
Message 105702 - Posted: 25 Mar 2022, 23:00:21 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?


Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these
rosetta_4.20_x86_64-pc-linux
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

So I guess that extra effort they put into the Windows version has not paid off.
ID: 105702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105703 - Posted: 25 Mar 2022, 23:03:27 UTC - in response to Message 105702.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.
ID: 105703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 401
Credit: 12,294,748
RAC: 5,104
Message 105704 - Posted: 25 Mar 2022, 23:27:12 UTC - in response to Message 105703.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.


But he specified that they were 4.20 tasks, not vm.
ID: 105704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105705 - Posted: 26 Mar 2022, 0:39:01 UTC

Watch out on aaam-mVAL_pp-NMPHE-ACBC-AMACBEN2_2_2537474_3 type tasks
First guy timed out, I aborted. 7 hours on 4 hours cpu

It starts ok: 022-03-25 20:00:56 (16268): Status Report: Elapsed Time: '6000.105049'
2022-03-25 20:00:56 (16268): Status Report: CPU Time: '6012.031250'

then: 022-03-25 21:41:41 (16268): Status Report: Elapsed Time: '12000.204857'
2022-03-25 21:41:41 (16268): Status Report: CPU Time: '6789.093750'

and 2022-03-25 23:23:03 (16268): Status Report: Elapsed Time: '18000.497971'
2022-03-25 23:23:03 (16268): Status Report: CPU Time: '6859.546875'

and
2022-03-26 01:05:12 (16268): Status Report: Elapsed Time: '24001.304815'
2022-03-26 01:05:12 (16268): Status Report: CPU Time: '6925.156250'

<aborted>
ID: 105705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105707 - Posted: 26 Mar 2022, 6:05:41 UTC - in response to Message 105704.  

But he specified that they were 4.20 tasks, not vm.

OK, I was referring to the VirtualBox (python) ones.
They are the only ones that have the "Vm job unmanageable" errors.
ID: 105707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105709 - Posted: 26 Mar 2022, 8:33:55 UTC - in response to Message 105702.  

Isn't the code written in Linux or other machine languages and then adapted to windows?


Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these
rosetta_4.20_x86_64-pc-linux
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

So I guess that extra effort they put into the Windows version has not paid off.
There are no programmers in Rosetta that have a clue.
ID: 105709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105710 - Posted: 26 Mar 2022, 8:35:56 UTC - in response to Message 105703.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.
You don't have to reboot, just restart the Boinc client. And I'm sure there must be a command we can send to the client, using Boinc_cmd that would make it try again. I have asked in the main Boinc forum....
ID: 105710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 105725 - Posted: 28 Mar 2022, 2:03:47 UTC

I noticed that the server disabled Python tasks for my computer. After I enabled them, 21 of them completed. All of them validated. Six more are still running.

The computer info that looks relevant:

3/27/2022 8:37:31 PM | | Starting BOINC client version 7.16.20 for windows_x86_64
3/27/2022 8:37:31 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10]
3/27/2022 8:37:31 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
3/27/2022 8:37:31 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19044.00)
3/27/2022 8:37:31 PM | | VirtualBox version: 6.0.14

Maybe this will help decide which computers usually handle which CPU features are enough to handle the Python tasks correctly.
ID: 105725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105732 - Posted: 28 Mar 2022, 10:44:00 UTC - in response to Message 105725.  
Last modified: 28 Mar 2022, 10:46:35 UTC

Thanks. I've made a conclusion so far that it's one or more of these instructions causing the problem, since they're present on all the working ones, but none of the failed ones:

avx, avx2, f16c, fma
These are quite likely to be at fault, since TN-Grid for example makes good use of avx and fma if present and sends a different program if your CPU doesn't have them.

Here's the spreadsheet so far, with the offending instructions in bold.

https://www.dropbox.com/scl/fi/8gp41r6sh7ffkqupvglbp/Rosetta-Python-CPU-instruction-set.xlsx?dl=0&rlkey=4ubjc4jqyng1o9ivqyckl8hek

Not sure where we go from here. If the program requires an avx capable machine, I doubt Rosetta are willing to make it not need that since it's only missing on older machines.
ID: 105732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105734 - Posted: 28 Mar 2022, 10:59:57 UTC - in response to Message 105655.  

VirtualBox comes in two major versions, vbox and vbox64. The Python tasks use only the newer of these, vbox64. Since vbox emulates a 32-bit instruction set and vbox64 emulates a 64-bit instruction set, they are not interchangeable.

Each is a program, and therefore requires a certain list of instructions from the physical CPU core it runs on. BOINC makes a list of the major groups of instructions available as it starts up.

It appears that vbox has been in use long enough that it only uses CPU instructions available on nearly all computers still in use, but vbox64 hasn't.

VirtualBox

https://www.virtualbox.org/wiki/Downloads

https://www.virtualbox.org/

If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.

The details you send them should include the list of CPU instruction groups produced when BOINC starts up.

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now?
ID: 105734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 105735 - Posted: 28 Mar 2022, 13:07:02 UTC - in response to Message 105734.  

[snip]

If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.

The details you send them should include the list of CPU instruction groups produced when BOINC starts up.

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now?


Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.
ID: 105735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105737 - Posted: 28 Mar 2022, 13:19:18 UTC - in response to Message 105735.  

If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.

The details you send them should include the list of CPU instruction groups produced when BOINC starts up.

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now?
Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.
It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.
ID: 105737 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 105739 - Posted: 28 Mar 2022, 14:10:46 UTC - in response to Message 105737.  

[snip]

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
From the data collected, the instructions are one or more of avx, avx2, f16c, fma. What do you suggest we do now?
Time to ask Oracle to produce more meaningful error messages if any of the missing instructions are not present when vbox64 runs.
It sounds like you know more about the way Oracle works than me - particularly whether Oracle or the program decides what instructions are available. Perhaps you should contact them? I would have thought Oracle just passes the available instruction set to the Python program, but maybe not.

I tried contacting Oracle. They made it rather difficult.
ID: 105739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105740 - Posted: 28 Mar 2022, 14:48:19 UTC - in response to Message 105739.  

I tried contacting Oracle. They made it rather difficult.
Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly?

If you want to contact Oracle, there seems to be many ways to do so, here: https://www.virtualbox.org/wiki/Community
ID: 105740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105742 - Posted: 28 Mar 2022, 18:09:26 UTC - in response to Message 105740.  

I tried contacting Oracle. They made it rather difficult.
Well we know Rosetta is impossible to speak to. Trouble is, are we sure who is to blame here? Does Oracle have a feature missing, or is Rosetta programmed badly?

If you want to contact Oracle, there seems to be many ways to do so, here: https://www.virtualbox.org/wiki/Community



Why should Oracle care about a little problem with a specific program that does not affect thousands or tens of thousands of users of it's product? That is probably why they ran you off.

It's like me contacting a cold wear testing lab about a specific product they tested and showed data for only 2 out of 12 zones and neither of these zones are critical to the more important areas that get cold the fastest. I am only a individual contacting a company that tests for million dollar industrial foot companies. My request got round filled or back burnered.
ID: 105742 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 105760 - Posted: 31 Mar 2022, 0:03:53 UTC
Last modified: 31 Mar 2022, 0:07:08 UTC

A failing Python task.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1317287356

The section of the vbox_trace.txt file that looks relevant:

2022-03-28 14:55:36 (14760):
Command: VBoxManage -q showvminfo "boinc_35d83054a4475009" --machinereadable
Exit Code: -2135228415
Output:
VBoxManage.exe: error: Could not find a registered machine named 'boinc_35d83054a4475009'
VBoxManage.exe: error: Details: code VBOX_E_OBJECT_NOT_FOUND (0x80bb0001), component VirtualBoxWrap, interface IVirtualBox, callee IUnknown
VBoxManage.exe: error: Context: "FindMachine(Bstr(VMNameOrUuid).raw(), machine.asOutParam())" at line 2621 of file VBoxManageInfo.cpp

2022-03-28 14:55:36 (14760):
Command: VBoxManage -q showhdinfo "C:ProgramDataBOINCslots10/vm_image.vdi"
Exit Code: 0
Output:
UUID: ef35dff9-d482-48f8-9519-fef6c1b23a3b
Parent UUID: base
State: created
Type: normal (base)
Location: C:ProgramDataBOINCslots10vm_image.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 8192 MBytes
Size on disk: 7115 MBytes
Encryption: disabled


Elapsed time MUCH greater than simulated CPU time.

I aborted it.
ID: 105760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org