Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 193 · 194 · 195 · 196 · 197 · 198 · 199 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105630 - Posted: 21 Mar 2022, 23:00:07 UTC - in response to Message 105625.  

Some kind of a response from them would be nice, perhaps one of:

1) We know there's a problem but our programmers are too inept to fix it.

2) We didn't know there was a problem because our heads are buried in the sand.

3) We know there's a problem and we're working on fixing it by [insert date]



HAHAHAHA...yeah right. Take #1 and add we don't care
ID: 105630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105631 - Posted: 21 Mar 2022, 23:01:48 UTC

Idiot server kicked me off after 2 aborts and 1 error all from aagb
If they would make things correct the first time I wouldn't have this problem.
ID: 105631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,539,024
RAC: 10,411
Message 105632 - Posted: 21 Mar 2022, 23:07:27 UTC - in response to Message 105624.  

Never paused once

That's fine. If they pause, abort them. They never unpause in my experience.
But I have plenty of running and successfully completed aagb tasks too.
They may certainly be most susceptible, but it's not all of them.
Just check them 10mins after they've started and you'll know which way it's heading, then take the appropriate action.
ID: 105632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bruce Morse

Send message
Joined: 8 Oct 05
Posts: 5
Credit: 2,056,124
RAC: 38,533
Message 105640 - Posted: 22 Mar 2022, 13:17:58 UTC

I have a two applications of
Rosetta python projects 1.03 (vbox64) running.
aagb-SAR_pp-…..
And
aaam-PRO_pp-….
They are currently showing elapsed time; Time remaining:
5d 14:45:56 00:00:04
5d 14:39:50. 00:00:04

The elapsed timer is running.
The time remaining has been getting progressively longer and longer between changes - currently measured in hours.

Any ideas?
ID: 105640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105641 - Posted: 22 Mar 2022, 13:20:51 UTC - in response to Message 105640.  

I have a two applications of
Rosetta python projects 1.03 (vbox64) running.
aagb-SAR_pp-…..
And
aaam-PRO_pp-….
They are currently showing elapsed time; Time remaining:
5d 14:45:56 00:00:04
5d 14:39:50. 00:00:04

The elapsed timer is running.
The time remaining has been getting progressively longer and longer between changes - currently measured in hours.

Any ideas?
You need to see how much CPU time they're actually using. These tasks tend to sit doing nothing. If you have Boinctasks, this shows real CPU usage. Or you can use Windows task manager. If they aren't doing anything, abort them.
ID: 105641 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bruce Morse

Send message
Joined: 8 Oct 05
Posts: 5
Credit: 2,056,124
RAC: 38,533
Message 105642 - Posted: 22 Mar 2022, 13:51:28 UTC - in response to Message 105640.  

Additional notes:
Menu options in BOINC Manager are no longer functioning, including the snooze, about and exit options from the taskbar;
Outlook will no longer start,

Just noticed that my elapsed time NOW reads three (3) seconds remaining.

The version of Vbox is the one distributed with BOINC and has not been updated.

Is/Are there some settings in Vbox that *I* should have modified?

Vbox shows both tasks running and a pop up indicates a new version available: 6.1.32
Current version: 6.0.14r133895 (Qt5.6.2)

There is sporadic activity.
ID: 105642 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bruce Morse

Send message
Joined: 8 Oct 05
Posts: 5
Credit: 2,056,124
RAC: 38,533
Message 105644 - Posted: 22 Mar 2022, 14:01:13 UTC - in response to Message 105641.  

Checking windows 10 task manager:
baseline - There is very little cpu usage but there is some bursts of usage;
memory - minimal changes; disk - some;
Vbox Ethernet- zero; and
LAN network - some.
ID: 105644 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 274
Message 105645 - Posted: 22 Mar 2022, 14:06:44 UTC - in response to Message 105644.  

can you open virtualbox gui, press show and look at what virtualbox vm screens are showing?
Also open task C:programdataBOINCslots[slotnumber]shared and look at file modification times?
ID: 105645 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bruce Morse

Send message
Joined: 8 Oct 05
Posts: 5
Credit: 2,056,124
RAC: 38,533
Message 105646 - Posted: 22 Mar 2022, 14:25:11 UTC - in response to Message 105645.  
Last modified: 22 Mar 2022, 14:29:17 UTC

can you open virtualbox gui, press show and look at what virtualbox vm screens are showing?


Looks like it never started?
Last line:
Intel MKL FATAL ERROR: Error on loading function mkl_lapack_ps_mc3_dsytrf_l_small.

Also open task
C:programdataBOINCslots[slotnumber]shared and look at file modification times?


Most recent are 03/16/2022 05:46 PM
(Um.. today: 03/22/3022)

Kinda saddens me - it appears I have wasted many days.
ETA: left it running for now in case anyone wants additional information.
ID: 105646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105647 - Posted: 22 Mar 2022, 14:58:59 UTC - in response to Message 105646.  

can you open virtualbox gui, press show and look at what virtualbox vm screens are showing?
Looks like it never started?
Last line:
Intel MKL FATAL ERROR: Error on loading function mkl_lapack_ps_mc3_dsytrf_l_small.
I get that every single time on 5 of my 7 computers. Nobody knows why.

Which of your computers are having problems? From my end it looks like older computers don't work. Mine are:

Ryzen 9 3900XT - works all the time on Rosetta Python VB.
i5 8600K - works all the time on Rosetta Python VB.
Core 2 Quad Q8400 - gets the same error as you every time.
Pentium N3700 - gets the same error as you every time.
Dual Xeon X5650 - gets the same error as you every time.
Dual Xeon X5650 - gets the same error as you every time.
i3 M350 - gets the same error as you every time.
ID: 105647 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bruce Morse

Send message
Joined: 8 Oct 05
Posts: 5
Credit: 2,056,124
RAC: 38,533
Message 105648 - Posted: 22 Mar 2022, 15:15:09 UTC
Last modified: 22 Mar 2022, 15:23:49 UTC

I currently have only two computers actively running Vbox:

Toshiba laptop: Intel Pentium CPU 2020 (two core hyper thread)@ 2.4GHz; 16.0 GB RAM; Win10/home. & doesn’t want to play nice

6-core 3.2 GHz Intel Core i7-8700; 16.0 GB RAM; Win10/home. IS playing nice.
ID: 105648 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105649 - Posted: 22 Mar 2022, 15:36:07 UTC - in response to Message 105648.  

I currently have only two computers actively running Vbox:

Toshiba laptop: Intel Pentium CPU 2020 (two core hyper thread)@ 2.4GHz; 16.0 GB RAM; Win10/home. & doesn’t want to play nice

6-core 3.2 GHz Intel Core i7-8700; 16.0 GB RAM; Win10/home. IS playing nice.
You seem to be getting the same as me. Newer machines work, older machines don't. I'm going to guess the Python app is using newer instruction sets only available on newer processors, and the incompetant fools at Rosetta are handing them out to everybody instead of only those that can handle it. They must be relying on you failing a lot of them so it automatically switches off your computer from Python, but the trouble is they don't just quickly fail, they sit doing nothing for days. And you have to fail 100 of them (not just abort them) before it bans you from Python.
ID: 105649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,789,281
RAC: 5,194
Message 105650 - Posted: 22 Mar 2022, 19:44:39 UTC - in response to Message 105649.  

Newer machines work, older machines don't. I'm going to guess the Python app is using newer instruction sets only available on newer processors, and the incompetant fools at Rosetta are handing them out to everybody instead of only those that can handle it.


If i'm not wrong, VirtualBox exposes instructions sets automaticaly to guest machines so you're idea is not so fool.
Python app is running TrRosetta simulations that are, probably, compiled against Tensorflow. Someone has this problem with old cpu
ID: 105650 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105651 - Posted: 22 Mar 2022, 19:54:44 UTC - in response to Message 105650.  

Newer machines work, older machines don't. I'm going to guess the Python app is using newer instruction sets only available on newer processors, and the incompetant fools at Rosetta are handing them out to everybody instead of only those that can handle it.


If i'm not wrong, VirtualBox exposes instructions sets automaticaly to guest machines so you're idea is not so fool.
Python app is running TrRosetta simulations that are, probably, compiled against Tensorflow. Someone has this problem with old cpu
CPUs have many different combinations of instructions sets, it should be tested for before running!
ID: 105651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnDK
Avatar

Send message
Joined: 6 Apr 20
Posts: 33
Credit: 2,390,240
RAC: 0
Message 105654 - Posted: 22 Mar 2022, 20:30:55 UTC

I just can't get the python WUs to work properly on my Linux host, they all too often pauses with the VM unmanageable error.

I started with 9 WUs and it has been suggested to lower that and one by one I'm down to 5. Right now I only have 4 left in cache and running, but 3 of them have already pausede with the VM error message after a BOINC restart. So the question of having enough RAM doesn't seem to apply, to my PC anyway.
ID: 105654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 105655 - Posted: 22 Mar 2022, 22:08:10 UTC
Last modified: 22 Mar 2022, 22:10:35 UTC

VirtualBox comes in two major versions, vbox and vbox64. The Python tasks use only the newer of these, vbox64. Since vbox emulates a 32-bit instruction set and vbox64 emulates a 64-bit instruction set, they are not interchangeable.

Each is a program, and therefore requires a certain list of instructions from the physical CPU core it runs on. BOINC makes a list of the major groups of instructions available as it starts up.

It appears that vbox has been in use long enough that it only uses CPU instructions available on nearly all computers still in use, but vbox64 hasn't.

VirtualBox

https://www.virtualbox.org/wiki/Downloads

https://www.virtualbox.org/

If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.

The details you send them should include the list of CPU instruction groups produced when BOINC starts up.

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
ID: 105655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zxcvbob

Send message
Joined: 4 Jan 06
Posts: 8
Credit: 830,878
RAC: 0
Message 105660 - Posted: 23 Mar 2022, 5:25:20 UTC

Are there no 32-bit work units? One of my better systems (that crunches WCG very well when that project is up) has Windows 10 Pro 32-bit. I attached it to R@H several hours ago and it's getting no tasks. It does not have vbox installed, but my 64-bit machine without vbox (or do you call it vbox64?) is getting new work.
ID: 105660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105661 - Posted: 23 Mar 2022, 7:06:03 UTC - in response to Message 105655.  

If some of you can identify specific emulated CPU instructions for which emulation fails and shuts down the emulation, you might give the details to Oracle and see if they will fix at least part of the problem, even if Rosetta@Home won't help.

The details you send them should include the list of CPU instruction groups produced when BOINC starts up.

One thing many of us might send them is a request that when the VM unmanageable error is given, vbox64 should give more details on why.
The best way to do this would be for many of us to create a big list of CPUs, their instructions sets as reported by Boinc, and if they run Python or not. We can then see which instruction is causing the problem. I'll start us off with my 7 machines, add your own please, and I'll shove them all in a spreadsheet and see what's what:

Ryzen 9 3900XT, WORKS, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw ibs skinit wdt tce topx page1gb r (I think this is truncated?)

i5-8600K, WORKS, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle

Core 2 Quad Q8400, DOESN'T WORK, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx tm2 pbe

Pentium N3700, DOESN'T WORK, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 movebe popcnt aes rdrandsyscall nx lm vmx tm2 pbe smep

Xeon X5650, DOESN'T WORK, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe

i3 M350, DOESN'T WORK, fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
ID: 105661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,421,195
RAC: 20,178
Message 105662 - Posted: 23 Mar 2022, 7:16:20 UTC - in response to Message 105660.  

Are there no 32-bit work units?
Work units are just data that can be processed by any software that has been written to process it.
Rosetta has both 32 bit & 64 bit applications.

Non-Python Rosetta 4.20 Tasks are very rare, it's just the the luck of the draw if your system just happens to request work when there are actually some available.
Grant
Darwin NT
ID: 105662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 105663 - Posted: 23 Mar 2022, 7:20:49 UTC - in response to Message 105662.  
Last modified: 23 Mar 2022, 7:21:13 UTC

Are there no 32-bit work units?
Work units are just data that can be processed by any software that has been written to process it.
Rosetta has both 32 bit & 64 bit applications.

Non-Python Rosetta 4.20 Tasks are very rare, it's just the the luck of the draw if your system just happens to request work when there are actually some available.
My non-python capable machines are attached to Rosetta so if 4.2 appears, they grab it, since Boinc will have a work debt for that project. But I give them other projects to do aswell.
ID: 105663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 193 · 194 · 195 · 196 · 197 · 198 · 199 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org