Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 280 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105685 - Posted: 25 Mar 2022, 8:50:53 UTC

ID: 105685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 68
Message 105687 - Posted: 25 Mar 2022, 14:24:51 UTC

Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time.

Processor: 32 AuthenticAMD AMD Ryzen 9 5950X 16-Core Processor [Family 25 Model 33 Stepping 0]

Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
ID: 105687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105688 - Posted: 25 Mar 2022, 15:05:24 UTC - in response to Message 105687.  

Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time.

It is a long-standing problem, much discussed here (you can search it).
It is mainly on Linux that I have seen. If you use Windows, and then the 5.2.44 VirtualBox version (not 6.1.x), you will not have the problem.

But there is another problem of tasks using very little CPU and running forever ("0 CPU" problem) that is common to both operating systems.
You just abort them as early as you find them.
ID: 105688 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105690 - Posted: 25 Mar 2022, 15:31:52 UTC

Linux users seem to be giving huge numbers of processor abilities compared to Windows users. Not sure what's going on here.
ID: 105690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105691 - Posted: 25 Mar 2022, 15:32:42 UTC - in response to Message 105688.  
Last modified: 25 Mar 2022, 15:34:16 UTC

Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time.

It is a long-standing problem, much discussed here (you can search it).
It is mainly on Linux that I have seen. If you use Windows, and then the 5.2.44 VirtualBox version (not 6.1.x), you will not have the problem.

But there is another problem of tasks using very little CPU and running forever ("0 CPU" problem) that is common to both operating systems.
You just abort them as early as you find them.
The biggest problem is computers which cannot run them AT ALL. EVERY SINGLE ONE uses no CPU time. I think this occurs on older CPUs and is a hardware incompatibility. I'm trying to work out which ones it happens on, and therefore what instruction is required on a CPU for it to be ok. Then perhaps Oracle can look into it.
ID: 105691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zxcvbob

Send message
Joined: 4 Jan 06
Posts: 8
Credit: 830,878
RAC: 0
Message 105692 - Posted: 25 Mar 2022, 16:01:46 UTC - in response to Message 105660.  

The 32-bit machine finally started getting work. That's the main point of this post.
Another 64-bit machine w/o vbox (older CPU but has a graphics card) wasn't getting anything so I signed it up for SiDock@Home and that is crunching away.
ID: 105692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 68
Message 105693 - Posted: 25 Mar 2022, 16:04:42 UTC - in response to Message 105688.  

Problem: WUs often pauses with the VM unmanageable error, no matter if I run 9 or only 5 WUs at a time.

It is a long-standing problem, much discussed here (you can search it).
It is mainly on Linux that I have seen. If you use Windows, and then the 5.2.44 VirtualBox version (not 6.1.x), you will not have the problem.

But there is another problem of tasks using very little CPU and running forever ("0 CPU" problem) that is common to both operating systems.
You just abort them as early as you find them.

Yes I'm running Linux on that host, on my Windows host I've no problem, even with VirtualBox 6.1.

On the Linux host I did have some of those 0 CPU procent WUs and did abort them, again don't think I've had a single one on my Windows host.
ID: 105693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 764
Message 105695 - Posted: 25 Mar 2022, 16:11:00 UTC - in response to Message 105690.  

Isn't the code written in Linux or other machine languages and then adapted to windows?
ID: 105695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105696 - Posted: 25 Mar 2022, 16:40:26 UTC - in response to Message 105695.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.
ID: 105696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 764
Message 105698 - Posted: 25 Mar 2022, 18:20:18 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


They don't have any real tech people anymore. No one to write new code. You know how resistant to change and updated programs they are.
ID: 105698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105700 - Posted: 25 Mar 2022, 19:13:37 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?

Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.
Linux incompatible with Linux. [rolls eyes in disbelief] I'll stick to Windows....
ID: 105700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 178
Credit: 5,712,950
RAC: 3,448
Message 105702 - Posted: 25 Mar 2022, 23:00:21 UTC - in response to Message 105696.  

Isn't the code written in Linux or other machine languages and then adapted to windows?


Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these
rosetta_4.20_x86_64-pc-linux
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

So I guess that extra effort they put into the Windows version has not paid off.
ID: 105702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105703 - Posted: 25 Mar 2022, 23:03:27 UTC - in response to Message 105702.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.
ID: 105703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 376
Credit: 10,834,554
RAC: 7,686
Message 105704 - Posted: 25 Mar 2022, 23:27:12 UTC - in response to Message 105703.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.


But he specified that they were 4.20 tasks, not vm.
ID: 105704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,711,666
RAC: 764
Message 105705 - Posted: 26 Mar 2022, 0:39:01 UTC

Watch out on aaam-mVAL_pp-NMPHE-ACBC-AMACBEN2_2_2537474_3 type tasks
First guy timed out, I aborted. 7 hours on 4 hours cpu

It starts ok: 022-03-25 20:00:56 (16268): Status Report: Elapsed Time: '6000.105049'
2022-03-25 20:00:56 (16268): Status Report: CPU Time: '6012.031250'

then: 022-03-25 21:41:41 (16268): Status Report: Elapsed Time: '12000.204857'
2022-03-25 21:41:41 (16268): Status Report: CPU Time: '6789.093750'

and 2022-03-25 23:23:03 (16268): Status Report: Elapsed Time: '18000.497971'
2022-03-25 23:23:03 (16268): Status Report: CPU Time: '6859.546875'

and
2022-03-26 01:05:12 (16268): Status Report: Elapsed Time: '24001.304815'
2022-03-26 01:05:12 (16268): Status Report: CPU Time: '6925.156250'

<aborted>
ID: 105705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 105707 - Posted: 26 Mar 2022, 6:05:41 UTC - in response to Message 105704.  

But he specified that they were 4.20 tasks, not vm.

OK, I was referring to the VirtualBox (python) ones.
They are the only ones that have the "Vm job unmanageable" errors.
ID: 105707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105709 - Posted: 26 Mar 2022, 8:33:55 UTC - in response to Message 105702.  

Isn't the code written in Linux or other machine languages and then adapted to windows?


Probably so, but there are different versions of Linux, so they use VirtualBox to allow them to run on any version of Linux, or Windows.

Chances are, they put more effort into the Windows version, since that is what most people use.
They should recompile the wrapper to get it to run properly on Ubuntu, but they haven't put in the effort yet.


My main machine runs Red Hat Enterprise Linux release 8.5 (Ootpa). I was running these
rosetta_4.20_x86_64-pc-linux
I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

So I guess that extra effort they put into the Windows version has not paid off.
There are no programmers in Rosetta that have a clue.
ID: 105709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105710 - Posted: 26 Mar 2022, 8:35:56 UTC - in response to Message 105703.  

I just completed five of those work units on my Linux machine. Each had failed previously -- by machines running Windows.

The "Vm job unmanageable" ones don't fail, they just suspend for 24 hours or until you reboot.
It is the "0 CPU" ones that fail. As I said, it is with both operating systems.
You don't have to reboot, just restart the Boinc client. And I'm sure there must be a command we can send to the client, using Boinc_cmd that would make it try again. I have asked in the main Boinc forum....
ID: 105710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,908,055
RAC: 3,520
Message 105725 - Posted: 28 Mar 2022, 2:03:47 UTC

I noticed that the server disabled Python tasks for my computer. After I enabled them, 21 of them completed. All of them validated. Six more are still running.

The computer info that looks relevant:

3/27/2022 8:37:31 PM | | Starting BOINC client version 7.16.20 for windows_x86_64
3/27/2022 8:37:31 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10]
3/27/2022 8:37:31 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2
3/27/2022 8:37:31 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.19044.00)
3/27/2022 8:37:31 PM | | VirtualBox version: 6.0.14

Maybe this will help decide which computers usually handle which CPU features are enough to handle the Python tasks correctly.
ID: 105725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,791,763
RAC: 5,453
Message 105732 - Posted: 28 Mar 2022, 10:44:00 UTC - in response to Message 105725.  
Last modified: 28 Mar 2022, 10:46:35 UTC

Thanks. I've made a conclusion so far that it's one or more of these instructions causing the problem, since they're present on all the working ones, but none of the failed ones:

avx, avx2, f16c, fma
These are quite likely to be at fault, since TN-Grid for example makes good use of avx and fma if present and sends a different program if your CPU doesn't have them.

Here's the spreadsheet so far, with the offending instructions in bold.

https://www.dropbox.com/scl/fi/8gp41r6sh7ffkqupvglbp/Rosetta-Python-CPU-instruction-set.xlsx?dl=0&rlkey=4ubjc4jqyng1o9ivqyckl8hek

Not sure where we go from here. If the program requires an avx capable machine, I doubt Rosetta are willing to make it not need that since it's only missing on older machines.
ID: 105732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 195 · 196 · 197 · 198 · 199 · 200 · 201 . . . 280 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org